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Abstract 

The  primary  goal  of  this  research  was  to  determine  if  Artificial  Neural  Networks 
(ANNs)  can  be  trained  to  classify  the  correlation  signatures  of  direct  sequence  and 
frequency-hopped  spread-spectrum  signals.  Secondary  goals  were  to  determine  (1) 
if  network  classification  performance  can  be  modeled  with  a  conditional  probability 
matrix,  (2)  if  the  symmetry  of  the  matrices  can  be  controlled,  and  (3)  if  using 
a  majority  vote  rule  over  independently  trained  networks  improves  classification 
performance. 

Correlation  signatures  of  the  spread-spectrum  signals  were  obtained  from  United 
States  Army  Harry  Diamond  Laboratories.  The  signatures  were  preprocessed  and 
separated  into  various  training  and  testing  data  sets.  Thirty  samples  of  network  re¬ 
sponses  for  several  sets  of  trainir  '  conditions  were  gathered  using  a  neural  network 
simulator. - 

ANNs  trained  directly  on  correlation  signature  data  yielded  classification  ac¬ 
curacies  on  test  data  at  or  near  80%.  The  probability  matrices  were  stationary  with 
regard  to  test  sets  and  the  ability  to  shift  the  symmetry  of  the  matrices  was  demon¬ 
strated.  Improvement  of  classification  accuracy  via  majority  vote  was  possible  if  the 
nets  were  trained  on  different  data  sets.  An  average  improvement  of  1.8%  was  found 
to  be  statistically  significant  for  a  =  0.05.  A  metric  was  developed  to  estimate  the 
similarity  of  the  solutions  found  by  networks  in  a  given  training  run. 
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CLASSIFICATION  OF  ACOUSTO-OPTIC 
CORRELATION  SIGNATURES  OF  SPREAD 
SPECTRUM  SIGNALS  USING  ARTIFICIAL 
NEURAL  NETWORKS 


I.  Introduction 


1.1  Historical  Background. 

Over  the  past  two  decades,  use  of  spread-spectrum  systems  in  the  Department 
of  Defense  has  become  increasingly  common.  This  is  primarily  due  to  the  militar¬ 
ily  desirable  antijam  (AJ),  antiinterference,  and  low  probability  of  intercept  (LPI) 
characteristics  of  these  signals  [l].  It  is  reasonable  to  assume  that  our  adversaries 
will  use  spread-spectrum  systems  in  any  future  conflict.  Naturally,  it  follows  that 
we  should  prepare  to  defeat  the  AJ  and  LPI  characteristics  of  these  signals  in  order 
to  deny  our  enemy  the  free  use  of  the  electromagnetic  spectrum. 

In  order  to  disrupt  a  hostile  communications  signal,  we  must  be  able  to  detect 
and  classify  the  signal.  Much  of  the  current  research  is  focused  on  solving  this  prob¬ 
lem.  Toward  that  end,  researchers  at  the  U.S.  Army  Harry  Diamond  Laboratories 
have  developed  a  one-dimensional  time-integrating  acousto-optic  (AO)  correlator  ca¬ 
pable  of  1015  operations  per  second  [2].  The  device  performs  the  correlation  transform 
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several  orders  of  magnitude  faster  than  even  the  most  powerful  digital  computers. 
This  correlator  can  be  used  in  an  intercept  receiver  to  detect  and  capture  the  corre¬ 
lation  signatures  of  spread-spectrum  signals.  Currently,  the  output  of  the  correlator 
is  examined  directly  by  a  human  operator  or  digitized  and  run  through  curve-fitting 
routines  on  digital  computers.  The  objective  is  to  quickly  recognize  the  correlation 
signature  and  obtain  information  about  its  time  domain  modulation  characteristics. 

Researchers  at  U.S.  Army  Harry  Diamond  Laboratories  have  suggested  using 
Artificial  Neural  Networks  (ANNs)  to  classify  spread-spectrum  communications  sig¬ 
nals.  The  ANN  would  be  trained  to  recognize  and  classify  specific  input  features 
present  in  the  correlation  signatures  of  several  types  of  spread-spectrum  signals. 
These  features  correspond  to  time  domain  characteristics  of  the  signal.  Classification 
of  the  signals  in  this  manner  essentially  extracts  information  about  the  modulation 
parameters  used  to  construct  the  signal.  In  an  eventual  hardware  implementation, 
the  network  would  operate  in  real  time  at  the  output  of  an  AO  correlator.  In  a 
detection  scenario,  a  priori  known  features  would  be  used  to  train  the  net.  The  net 
would  send  an  alarm  upon  detection  of  those  features.  The  package  would  then  be 
mounted  in  a  remotely  piloted  vehicle  (RPV)  and  flown  over  an  area  of  interest  as 
shown  in  Figure  1.1.  A  first  step  in  this  effort  would  be  to  construct  and  train  an 
ANN  to  determine  whether  a  particular  input  spread-spectrum  correlation  signature 
is  a  direct  sequence  (DS)  or  frequency-hopped  (FH)  signal. 
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RPV-bourne 
Intercept  Package 


Figure  1.1.  Detection  Scenario  [2] 

1.2  Problem  Statement. 

The  primary  objective  of  this  thesis  is  to  answer  several  questions: 

1.  Can  ANNs  be  trained  to  classify  DS  and  FH  correlation  signatures,  and  if  so, 
at  what  level  of  classification  performance? 

2.  Can  a  trained  ANN’s  response  to  previously  unseen  signatures  be  accurately 
modeled  and  described  by  a  transition  probability  matrix  similar  to  those  used 
to  describe  communication  channels?  If  so,  can  the  symmetry  of  these  matrices 
be  controlled?  Control  of  matrix  symmetry  would  allow  network  classification 
responses  to  be  tailored  for  a  particular  application. 
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3.  Can  classification  accuracy  be  improved  by  using  a  majority  vote  decision  rule 
over  the  response  of  an  appropriate  number  of  ANNs  trained  to  classify  the 
same  correlation  signatures? 

1.3  Scope. 

The  intent  of  this  thesis  effort  is  to  determine  the  applicability  of  ANN  technol¬ 
ogy  to  the  classification  of  spread-spectrum  correlation  signatures.  The  classification 
will  be  performed  by  means  of  ANN  simulation  software.  The  simulated  ANN  will 
be  trained  to  classify  DS  and  linearly  stepped  FH  correlation  signatures.  The  re¬ 
sults  of  training  will  be  evaluated  by  observing  the  ANN  response  to  test  data.  The 
second  question  posed  in  the  problem  statement  presumes  a  positive  answer  to  the 
first  question.  Once  ANNs  are  successfully  trained  to  classify  correlation  signature 
data,  an  examination  will  be  performed  of  the  transition  probability  matrix  model. 
If  trained  ANNs  can  be  satisfactorily  described  by  transition  probability  matrices, 
then  an  attempt  to  control  the  resulting  symmetry  will  be  made.  Finally,  the  per¬ 
formance  of  composite  networks  composed  of  three  single  nets  using  a  majority  vote 
rule  will  be  compared  to  the  performance  of  individual  nets. 

1.4  General  Approach. 

The  approach  to  answer  the  questions  posed  in  the  problem  statement  was 
broken  down  into  three  phases:  basic  research  and  experiment  set-up,  collecting 
experiment  data,  and  analysis  of  the  data.  The  first  phase  involved  a  review  of 
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current  literature  and  research  in  the  area  oi  ANN  theory  and  applications,  selection 
of  appropriate  ANN  simulation  software  implementing  the  most  promising  topology 
and  training  algorithm,  and  preparation  of  the  input  data  sets  for  use  with  the 
selected  simulation  software.  The  second  phase  consisted  of  training  and  performance 
testing  a  number  of  ANNs  under  various  conditions  with  correlation  signature  data 
sets.  In  the  final  phase  of  the  thesis  effort,  the  performance  data  is  analyzed  and  final 
conclusions  drawn.  Conclusions  and  recommendations  will  be  based  on  observation 
and  analysis  of  experimental  results. 

1.5  Thesis  Organization. 

This  chapter  served  as  an  introduction  and  general  overview  of  the  thesis  effort. 
Chapter  two  provides  a  discussion  of  the  fundamental  concepts  of  ANNs  and  a  review 
of  recent  ANN  research.  Chapter  three  contains  a  complete  description  of  resources 
used,  definitions  and  notation,  and  how  the  experiments  were  constructed  and  per¬ 
formed.  Chapter  four  presents  the  results  and  analysis  of  the  experiments  described. 
The  fifth  and  final  chapter  contains  specific  conclusions  along  with  recommendations 
for  future  research. 
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II.  Background  Material 


2. 1  Introduction. 

This  background  material  is  limited  to  discussion  of  work  relevant  to  the  spe¬ 
cific  problem  of  applying  ANNs  as  classifiers  for  continuous  valued  input  vectors  or 
waveforms  similar  to  the  output  of  the  AO  correlator.  The  material  is  divided  into 
two  distinct  parts.  The  first  part  introduces  the  basic  concepts  and  terminology  com¬ 
monly  used  in  the  ANN  literature.  The  discussion  provides  only  what  is  necessary 
to  understand  the  second  part,  which  presents  summaries  of  the  major  findings  of 
more  recent  work  in  the  application  and  enhancement  of  ANNs.  For  a  more  detailed 
study,  the  reader  is  referred  to  the  cited  works. 

2.2  Basic  Concepts  of  ANNs. 

An  excellent  beginner’s  guide  to  ANNs  can  be  found  in  an  article  by  Lippmann 
[3],  which  discusses  the  basic  concepts  of  ANN  models  and  describes  six  common 
architectures.  Two  of  these  models,  the  Kohonen  self-organizing  feature  map  (Ko- 
honen  net)  and  the  multi-layer  perceptron,  can  be  applied  to  the  problem  at  hand 
because  they  accept  continuous  valued  inputs.  In  this  thesis  ,  the  focus  is  on  the 
multi-layer  perceptron.  According  to  Lippmann,  ANN  models  reflect  an  extremely 
simplified  model  the  current  understanding  of  how  biological  nervous  systems  work. 
He  states  that  an  ANN  model  is  fully  specified  by  the  computational  characteris- 
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tics  of  its  basic  computing  unit,  network  topology,  and  learning  algorithms  used  for 
training  the  net  [3:4,  6].  The  following  sections  discuss  these  parameters  as  they 
apply  to  the  multi-layer  perceptron. 

2.2.1  The  Node.  The  fundamental  computing  unit  of  an  ANN  is  the  node, 
which  is  analogous  to  the  neuron  in  biological  nervous  systems.  The  computing 
power  of  a  node  is  fairly  limited.  Lippmann  expresses  its  output  as  a  nonlinear 
function  of  the  weighted  sum  of  its  inputs,  as  follows: 


N—l 


Y  =  /(£  XiWi  -  9) 

i= o 


(2.1) 


where 

Y  =  output 
Wt  =  connection  weights 
X{  =  inputs 
9  =  threshold 

Depending  on  the  application,  the  nonlinear  function  may  be  a  hard  limiter, 
threshold  logic,  or  a  sigmoid,  as  shown  in  Figure  2.1.  It  should  be  noted  here 
that  a  multi-layer  perceptron  using  a  back  propagation  training  algorithm  requires  a 
continuously  differentiable  function  [3:17],  Typically  this  is  a  sigmoid  function,  also 
known  as  a  squashing  function,  of  the  form: 
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Figure  2.1.  Nonlinear  Functions  [3:5] 

where  a  is  the  argument  of  the  function  in  Equation  (  2.1).  The  schematic  repre¬ 
sentation  of  Equation  (  2.1)  is  shown  in  Figure  2.2. 

2.2.2  Topology.  When  many  nodes  are  massively  interconnected  in  parallel 
or  in  layers,  the  network  as  a  whole  is  capable  of  performing  complex  computations 
at  high  speeds.  The  way  in  which  nodes  are  interconnected  defines  a  topology.  We 
are  particularly  interested  in  the  topology  of  multi-layer  perceptrons. 

A  multi-layer  perceptron  is  composed  of  one  or  more  layers  of  nodes  between 
the  input  and  the  output  layer  of  nodes.  These  inner  layers  are  referred  to  as  hidden 
layers.  Each  node  in  a  layer  is  interconnected  to  all  the  nodes  in  the  preceding  and 
following  layers.  Thus,  the  node  receives  input  from  each  of  the  nodes  in  the  previous 
layer  (or  from  the  net  inputs)  and  sends  its  output  to  all  nodes  in  the  next  layer, 
as  shown  in  Figure  2.3.  Although  it  is  possible  to  have  more,  the  usual  number  of 
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Figure  2.2.  Computational  Capability  of  a  Node  [3:5] 

layers  is  three.  According  to  Lippmann  [3:16],  ”  ...  no  more  than  three  layers  are 
required  in  perceptron-like  feed-forward  nets  because  a  three-layer  net  can  generate 
arbitrarily  complex  decision  regions.”  It  can  be  shown  that  two  layer  networks  also 
have  this  ability,  but  may  require  more  training  iterations  to  reach  an  equivalent 
solution  [4]. 

2.2.3  Learning  Algorithms.  When  individual  nodes  are  interconnected  and 
organized  into  a  network  topology,  the  network  can  be  trained  with  a  learning  algo¬ 
rithm  to  perform  a  specific  task.  The  learning  algorithm  specifies  the  way  in  which 
the  weights  between  nodes  are  updated  during  training.  Initially,  the  weights  of  an 
untrained  net  are  set  to  small  random  values.  A  number  of  input  examples,  referred 
to  as  a  training  data  set  or  training  set  exemplars,  are  presented  to  the  network  one 
at  a  time.  At  each  presentation,  an  error  signal  is  generated  and  the  weights  updated 
so  that  the  error  is  minimized  via  a  gradient  search.  Training  continues  until  the 
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Figure  2.3.  Three-layer  Perceptron  [3:16] 

weights  no  longer  change  or  the  change  is  less  than  some  threshold  value.  At  this 
point  training  is  terminated  [3].  The  heart  of  the  training  algorithm  is  the  way  the 
error  signal  is  computed  and  minimized. 

Training  algorithms  can  be  separated  into  three  categories:  unsupervised,  su¬ 
pervised,  and  self  supervised.  In  unsupervised  training,  the  network  is  given  no 
information  as  to  what  class  the  input  belongs  to.  The  error  signal  is  computed 
solely  as  a  function  of  the  current  input  and  output.  On  the  other  hand,  in  super¬ 
vised  learning,  the  net  is  provided  information  about  the  correct  classification  (the 
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desired  output)  for  the  present  input.  Thus,  the  error  signal  is  a  function  of  the  cur¬ 
rent  network  output  and  the  desired  output  [3:19].  In  self-supervised  learning,  the 
network  monitors  its  performance  internally,  feeding  back  an  error  signal  to  itself  [5]. 

The  multi-layer  perceptron  is  trained  with  supervision  using  the  back  propaga¬ 
tion  gradient  algorithm,  which  is  a  generalized  form  of  the  least  mean  square  (LMS) 
algorithm.  Because  of  the  multiple  layers,  an  error  signal  for  each  layer  must  be 
produced.  For  the  output  layer  the  error  signal  is  given  by: 

-»$)(«*;->$)  (2-3) 

where  Yj  is  the  output  of  node  j,  and  dj  is  the  desired  output  of  node  j.  Equation  2.3 
is  the  partial  derivative  of  the  error  with  respect  to  the  output  layer  weights  [4].  For 
the  hidden  layers,  the  error  is  computed  a  little  differently,  since  there  is  no  way 
to  specify  the  desired  output  of  hidden  nodes.  An  error  signal  for  a  hidden  layer  is 
given  by: 

(2.4) 

k 

where  Xj  is  the  output  of  node  j  (or  input  j),  and  k  ranges  over  all  nodes  in  the  layer 
above.  The  weights  and  error  signals  are  computed  and  updated  from  the  output 
layer  back  to  the  input  in  a  recursive  fashion.  The  weights  in  each  layer  are  updated 


2-6 


according  to: 


\Vi3{t  -f  1)  =  Wait)  +  TjSjX'i  +  a{Wi}{t)  -  Wi}{t  -  1))  (2.5) 

where, 

YVij  —  weight  from  hidden  node  (or  input)  i  to  node  j 
X'3  =  output  of  node  j  (or  input  j) 

t ]  =  learning  rate 

6j  =  error  term  for  node  j 

a  =  momentum  gain 

The  learning  rate,  77,  controls  how  fast  the  weights  converge.  The  momentum 
gain,  a,  weights  the  contribution  of  the  previous  update  to  the  current  update.  Both 
of  these  parameters  are  set  to  a  value  between  0  and  1.  Multi-layer  perceptrons 
trained  with  the  back  propagation  algorithm  can  be  used  to  determine  which  class 
an  unknown  input  is  most  similar  to.  The  input  to  the  trained  network  may  be 
corrupted  by  noise  or  in  some  way  different  than  the  inputs  used  for  training.  In 
either  case,  the  network  must  classify  an  input  which  is  not  exactly  the  same  as  the 
inputs  used  to  train  the  network  [3]. 

2.3  Current  Research  in  ANNs. 

The  following  paragraphs  summarize  some  of  the  recent  research  efforts  in 
which  ANNs  were  used  to  classify  real  world,  continuous  valued  data.  The  dis¬ 
cussions  are  limited  to  presenting  the  purpose  or  objectives  of  the  research,  a  gen¬ 
eral  description  of  the  experiment,  and  the  overall  results  or  major  findings.  The 
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research  efforts  are  divided  into  those  which  were  application  oriented  and  those 
that  were  enhancement  oriented.  The  application  efforts  included  experiments  in 
training  three-layer  perceptrons  with  the  back  propagation  algorithm  as  described 
in  Section  2.2.3.  The  enhancement  efforts  were  directed  more  towards  improving 
the  training  performance  by  modifying  the  back  propagation  algorithm  or  network 
structures. 

2.3.1  Application  Oriented  Research.  A  great  deal  of  ANN  research  in  the 
military  community  is  directed  towards  the  problem  of  recognition  and  classification 
of  targets  from  sensor  data.  Successful  development  of  this  capability  would  be  the 
first  step  in  constructing  autonomous  weapons  systems.  Troxel  [6]  and  Gorman  [7] 
conducted  research  in  this  area. 

Troxel  trained  a  three-layer  perceptron  to  classify  multi  function  laser  radar 
data  of  tanks  and  trucks  at  various  aspect  angles.  His  approach  was  to  first  obtain 
segmented  target  images  using  a  doppler  segmenter  developed  by  Ruck  [8],  The 
segmented  images  were  then  transformed  into  a  position,  scale,  and  rotation  invariant 
(PSRI)  feature  space.  A  correlation  was  performed  between  the  transformed  data 
and  the  feature  space  itself.  The  correlation  peak  was  found  and  a  window  of  49 
data  points  around  the  peak  was  extracted  for  classification.  Once  these  data  points 
were  normalized,  Troxel  [6:1-594]  said  the  data  could  ”...be  thought  of  as  a  49 
dimensional  vector  of  length  1.”  These  vectors  were  then  used  to  train  the  networks. 
Troxel  reported  a  maximum  classification  accuracy  of  80%  on  test  data  sets.  In 
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addition,  he  suggested  a  procedure  for  selecting  an  appropriate  number  of  nodes  for 
each  hidden  layer  and  observed  that  greater  numbers  of  training  vectors  would  be 
needed  to  ensure  good  classification  performance  of  real  world  data. 

Similar  experimental  work  done  by  Gorman  [7]  involved  the  classification  of 
sonar  returns  from  undersea  objects.  The  primary  objective  of  Gorman’s  experiments 
was  to  determine  if  an  ANN  could  learn  to  distinguish  the  sonar  returns  from  a  metal 
cylinder  (target)  from  the  returns  from  a  cylindrical  rock  (non-target).  The  raw  data 
for  the  experiment  were  spectrograms  of  the  sonar  returns  of  the  cylinder  and  rock 
at  various  aspect  angles,  where  aspect  angle  refers  to  the  angle  from  which  the 
objects  were  illuminated  with  the  sonar  pulse.  The  sonar  returns  were  transformed 
into  a  spectral  envelope  representation  by  integrating  over  sampling  apertures  of  the 
short-term  Fourier  transform  spectrogram.  Sixty  samples  of  the  spectral  envelope 
were  normalized  and  used  as  input  to  the  network.  Two  layer  percepirons  with 
various  numbers  of  hidden  nodes  were  trained  and  performance  tested.  The  networks 
were  trained  on  two  types  of  data  sets:  aspect-angle  independent  and  aspect-angle 
dependent.  In  the  former  case,  the  sonar  returns  used  for  training  were  selected 
at  random  without  regard  to  aspect  angle.  In  the  latter  case,  the  returns  used  for 
training  were  chosen  so  as  to  ensure  that  all  aspect  angles  were  represented.  The  best 
test  set  classification  accuracy,  90.4%,  was  yielded  by  a  network  with  12  hidden  nodes 
and  trained  on  the  aspect-angle  dependent  training  set.  In  general,  Gorman  found 
that  greater  numbers  of  hidden  nodes  reduced  performance  variations,  and  training 
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with  aspect-angle  dependent  data  sets  resulted  in  the  best  classification  accuracy. 
In  other  words,  for  best  results,  make  sure  the  training  set  contains  examples  of  all 
the  various  flavors  and  colors  of  the  things  to  be  recognized. 

2.3.2  Enhancement  Oriented  Research.  Several  of  the  most  recent  ANN  re¬ 
search  efforts  conducted  at  the  Air  Force  Institute  of  Technology  (AFIT)  involved 
exploration  of  methods  to  improve  network  training  performance.  Training  perfor¬ 
mance  is  defined  by  the  number  of  training  iterations  required  for  the  network  weights 
(and  thus  its  classification  performance)  to  converge  to  relatively  constant  values. 
The  classification  accuracy  of  a  network  is  simply  the  percent  of  correct  responses  to 
a  set  of  inputs.  Usually,  a  test  set  of  exemplars,  not  included  in  the  training  set,  is 
used  for  this  purpose.  The  various  approaches  discussed  in  the  following  paragraphs 
include: 

1.  modifying  error  minimization  algorithms. 

2.  modifying  the  error  signal. 

3.  modifying  the  computational  functions  of  the  network  nodes. 

4.  combining  two  network  structures  and  algorithms  into  one  larger  network. 

Each  of  these  approaches  was  recently  investigated  in  thesis  efforts  at  AFIT.  In  addi¬ 
tion,  the  performances  of  the  enhanced  networks  were  compared  to  the  performances 
of  the  basic  three-layer  perceptron  trained  with  the  first  order  back  propagation  gra¬ 
dient  method  described  in  Section  2.2.3.  All  of  the  networks  were  eventually  trained 
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on  the  Ruck  [8]  data  sets,  thus  establishing  a  common  reference  for  judging  the 
algorithm  modifications. 

Piazza  [9]  performed  experiments  training  three-layer  perceptrons  using  a  sec¬ 
ond  order  error  minimization  technique  in  the  back  propagation  algorithm.  He  also 
trained  networks  with  the  first  order  methods:  the  basic  back  propagation  gradient 
method,  and  back  propagation  with  a  momentum  term  (momentum  method).  In 
the  gradient  method,  a  in  Equation  2.5  is  set  to  zero.  Piazza  found  training  perfor¬ 
mance  of  the  momentum  and  second  order  methods  significantly  better  than  that 
of  the  gradient  method.  The  second  order  method  was  only  slightly  better  than  the 
momentum  method.  Average  test  set  classification  performances  were  75%,  78%, 
and  78%  for  the  gradient,  momentum,  and  second  order  methods,  respectively. 

Several  approaches  to  improving  network  training  performance  were  demon¬ 
strated  by  Lutey  [10].  He  attacked  the  problem  in  three  different  ways:  modifying 
the  error  generating  function,  varying  the  rise  rate  of  the  sigmoid,  and  implementing 
more  complex  weighting  functions  in  the  nodes.  Networks  were  trained  with  each  of 
the  proposed  improvements  and  their  performances  compared  to  the  baseline  per¬ 
formance  of  the  gradient  method.  As  before,  all  networks  were  trained  on  the  Ruck 
data.  All  three  techniques  showed  significant  improvements  in  training  performance 
by  reducing  the  iterationr  required  for  convergence  over  the  baseline  case. 
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One  final  technique  for  enhancement  was  suggested  by  Tarr  [11].  He  per¬ 
formed  an  experiment  in  which  two  different  network  structures,  a  Kohonen  net  and 
multi-layer  perceptron,  were  combined  into  one  network.  The  Kohonen  net  served 
to  organize  and  simplify  the  input  data.  The  outputs  of  the  Kohonen  net  were  then 
fed  to  the  perceptron  which  performed  the  classification.  Again,  the  result  was  a 
significant  reduction  in  training  iterations  to  reach  convergence.  The  hybrid  net¬ 
work  had  many  more  nodes  than  a  three-layer  perceptron  back  propagation  net  of 
equivalent  classification  performance.  Tarr  observed  that  it  appeared  that  the  hybrid 
net  was  trading  a  reduction  in  training  time  (or  iterations)  for  numbers  of  nodes. 
The  test  set  classification  performance  accuracy  for  the  best  case  hybrid  network 
was  74%,  which  was  essentially  equivalent  to  the  baseline  gradient  back  propagation 
perceptron.  In  addition,  Tarr  observed  that  hybrid  networks  performed  better  than 
multi-layer  perceptrons  when  there  were  ambiguous  decision  regions  in  the  data  set, 
and  vice  versa  for  unambiguous  data  sets. 

2-4  Summary. 

The  problem  at  hand  is  to  determine  if  ANNs  can  be  used  to  classify  the 
correlation  signatures  of  spread-spectrum  communications  signals.  The  literature 
examined  in  previous  sections  indicates  that  it  should  be  possible  to  train  a  three- 
layer  perceptron  to  yield  a  classification  accuracy  in  the  range  of  75%  to  90%  using 
any  one  of  the  techniques  or  algorithms  described.  Also,  in  most  cases,  some  sort  of 
data  transformation  and/or  normalization  was  performed  on  real  world  data  before 
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presentation  to  an  ANN.  Finally,  it  is  apparent  that  most  researchers  average  the 
network  performances  over  a  number  of  training  trials.  This  immediately  implies 
that  random  processes  are  at  work  and  that  statistical  analysis  of  the  distributions 
of  performance  would  be  the  appropriate  analysis  tool. 
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III.  Methodology 


3.1  Introduction. 

This  chapter  provides  information  concerning  the  details  of  how  the  experi¬ 
ments  were  performed.  First,  a  description  of  the  resources  used  is  presented.  This 
is  followed  by  a  description  of  how  each  of  the  experiments  was  designed  and  imple¬ 
mented.  Finally,  the  manner  in  which  the  results  of  the  exeriments  were  analyzed  is 
given. 

3.2  Resources. 

This  section  will  cover  the  descriptions  of  all  the  resources  used  to  accomplish 
the  experiments  performed  in  this  thesis.  The  following  paragraphs  describe  the 
spread-spectrum  correlation  product  data  files,  the  artificial  neural  network  (ANN) 
simulator,  and  the  preprocessing  performed  on  the  correlation  product  data  in  order 
to  use  it  with  the  simulator. 

3.2.1  The  Correlation  Product  Data.  The  spread  spectrum  correlation  sig¬ 
natures  were  obtained  from  the  sponsor  of  this  thesis,  Harry  Diamond  Laboratories 
(HDL).  The  signatures  were  generated  by  simulating  various  direct  sequence  (DS) 
and  frequency  hopped  (FH)  signals  and  feeding  them  into  an  acousto-optic  (AO) 
correlator.  Different  signatures  of  each  type  of  signal  were  generated  by  varying  sev¬ 
eral  modulation  parameters;  chip  rate,  carrier  frequency,  psuedo-random  code,  etc. 
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Figure  3.1.  Path  of  Correlation  Product  Data.  (1)  AO  correlator  output  (2)  out¬ 
put  of  MC6800  microprocessor  (3)  upload  files  to  mainframe  (4)  files 
transferred  across  MILNET  to  AFIT  [2] 


The  FH  signals  were  not  driven  by  a  psuedo-random  code,  but  were  stepped  across 
frequency  ranges  by  a  linear  stepper.  For  the  remainder  of  this  thesis,  exemplars 
derived  from  DS  signatures  will  be  known  sis  class  1  exemplars,  while  those  derived 
from  FH  signatures  will  be  class  2  exemplars. 

The  output  of  the  AO  correlator  for  a  given  signature  was  sampled  and  written 
to  an  ASCII  file  as  a  column  of  numbers.  These  files  all  contained  1,000  data  points. 
The  files  were  transmitted  from  HDL  to  an  AFIT  computer  via  MILNET  using  the 
DoD  file  transfer  protocol  (FTP).  Figure  3.1  shows  a  block  diagram  of  the  process. 
The  files  were  then  downloaded  to  a  personal  computer  for  the  preprocessing  into 
the  format  required  by  the  ANN  simulator. 
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3.2.2  The  ANN  Simulator.  The  ANN  simulator  chosen  for  this  thesis  was 
the  NeuralGraphics  simulator  written  by  Captain  Greg  Tarr  [11].  The  simulator  was 
originally  designed  to  run  on  a  Silicon  Graphics  IRIS  workstation.  After  a  minor 
modification  to  turn  off  the  graphics  display,  the  software  was  ported  to  and  run 
on  a  SUN  4  workstation.  Two  other  minor  modifications  were  made  to  support  the 
data  requirements  of  the  experiments.  First,  a  terminal  test  routine  was  added.  The 
routine  writes  the  network  response  to  the  test  data  set  to  a  file.  The  file  contains  a 
test  exemplar  label,  the  true  class,  and  the  network’s  classification  decision  for  each 
exemplar  in  the  test  data  set.  In  addition,  a  training  history  file  for  the  networks 
was  also  written  to  a  file.  The  history  files  contain  the  outcome  of  performance  tests 
after  every  1,000  training  iterations  until  training  was  terminated.  A  sample  of  both 
types  of  these  files  can  be  found  in  Appendix  B.  The  second  modification  to  the 
software  concerned  the  setting  of  the  seeds  for  the  random  number  generator.  The 
generator  is  used  to  select  values  for  the  initial  weight  values  of  the  nets  at  the  start 
of  training  and  to  randomly  select  an  exemplar  for  presentation  to  the  net  during 
training.  These  training  conditions  were  controlled  by  varying  how  and  when  the 
generator  seed  was  selected.  For  example,  when  it  was  desired  that  successive  nets 
be  trained  from  the  same  initial  weight  state,  the  seed  for  selecting  the  initial  weights 
was  set  to  the  same  arbitrary  constant  for  each  net  trained.  If  it  was  desired  that 
successive  nets  be  trained  from  different  initial  weight  states,  the  seed  was  set  to  the 
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current  value  of  the  system  real  time  clock.  Control  over  the  order  of  presentation 
of  training  exemplars  was  obtained  in  a  similar  manner. 

The  NeuralGraphics  simulator  allows  the  user  to  select  one  of  several  network 
structures  and  training  algorithms.  The  back  propagation  networks  all  use  the  up¬ 
date  rules  described  in  Section  2.2.3.  The  values  of  the  learning  rate  t)  was  0.3,  while 
the  momentum  gain  a  was  0.8.  The  simulator  also  allows  user  specification  of  the 
number  of  nodes  in  the  hidden  layers  of  the  network.  As  pointed  out  by  Piazza  [9], 
Tarr  [11],  and  Troxel  [6],  there  is  no  known  method  for  selecting  the  best  number  and 
arrangement  of  hidden  layer  nodes.  In  short,  one  usually  resorts  to  a  trial  and  error 
search  for  a  combination  that  yields  reasonably  good  results.  Preliminary  training 
runs  showed  that  the  networks  trained  relatively  well  over  a  wide  range  of  combi¬ 
nations  and  numbers  of  hidden  layer  nodes.  Of  the  various  combinations  tried,  ten 
nodes  in  the  second  hidden  layer  and  18  nodes  in  the  first  hidden  layer  appeared 
to  yield  the  most  consistent  classification  performance.  In  addition,  it  was  observed 
that,  usually,  no  improvement  in  performance  occurred  after  20,000  training  itera¬ 
tions.  These  above  mentioned  parameter  values  were  used  for  training  all  networks 
in  this  thesis  and  20,000  iterations  was  used  as  the  condition  to  terminate  train¬ 
ing.  For  further  information  on  the  NeuralGraphics  simulator,  the  reader  is  referred 
to  [11]. 

3.2.3  Data  Set  Construction.  As  previously  stated,  the  correlation  signature 
data  files  contain  1,000  data  points  and  are  not  normalized.  The  number  of  out- 
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put  nodes  and  input  nodes  of  a  network  is  determined  by  number  of  classes  to  be 
recognized  and  the  number  of  elements  in  the  input  exemplar.  Since  presentation 
of  all  1,000  elements  of  a  data  file  as  inputs  to  a  network  was  beyond  the  capac¬ 
ity  of  the  simulation  software,  some  preprocessing  was  necessary.  All  of  the  data 
files  containing  the  correlation  signatures  were  processed  in  the  following  manner. 
First,  the  data  files  were  reduced  to  500  points  by  averaging  consecutive  pairs  of 
data  points  together.  Next,  the  data  point  with  the  largest  absolute  value  was  found 
and  all  data  points  divided  by  this  value.  The  result  of  these  two  steps  was  a  500 
point  signature  pattern  linearly  compressed  to  values  between  —1  and  1.  The  final 
step  was  to  extract  a  50  point  window  centered  about  the  peak  positive  value  of  the 
correlation  product  pattern.  The  position  of  the  maximum  positive  value  was  found 
and  the  50  data  points  roughly  centered  on  this  position  were  extracted  and  written 
to  a  file.  A  total  of  101  class  1  and  108  class  2  exemplar  patterns  were  available  to 
train  the  networks.  The  data  sets  used  by  the  simulator  to  train  networks  for  the 
different  experiments  were  constructed  from  this  pool  of  exemplars.  The  details  of 
the  construction  of  the  data  sets  may  be  found  in  Appendix  B. 

3.2.4  Definitions  and  Notation.  Before  proceeding  with  the  descriptions  of 
the  experiments,  the  terms  and  notation  used  in  the  remainder  of  this  thesis  must  be 
defined.  There  are  several  metrics  used  by  the  NeuralGraphics  simulator  to  evaluate 
network  performance:  total  error,  right  classification,  and  good  classification.  The 
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error  for  a  given  exemplar  is  defined  as 


error  =  (£(<£  -  Sh)2)1/2  (3.1) 

t=i 

where  m  is  the  number  of  output  nodes,  d  is  the  desired  output,  and  y  the  actual 
output.  The  total  error  over  the  entire  training  set  is  just  the  sum  of  the  error  for 
each  training  exemplar.  The  network  yields  a  right  classification  if  the  error  at  each 
output  node,  for  that  exemplar,  is  less  than  0.2.  A  good  classification  only  requires 
that  the  maximum  output  occur  at  the  node  representing  the  class  of  the  input 
exemplar  [11].  For  example,  if  the  desired  output  for  a  given  exemplar  were  [l,  0,  0], 
then  [0.91,  0.09,  0.09]  would  be  a  right  classification  and  [0.7,  0.5,  0.4]  would  be  a 
good  classification.  The  percentage  of  right  and  good  classifications  are  calculated  for 
both  the  training  set  exemplars  and  the  test  set  exemplars.  As  previously  mentioned, 
these  five  performance  metrics  are  computed  after  every  1,000  training  iterations. 

Since  the  output  y  is  a  function  of  the  inputs  and  the  set  of  weights,  ?y,;,  then 
for  a  fixed  set  of  training  inputs,  it  is  quite  natural  to  envision  an  error  surface 
generated  by  allowing  each  weight  to  vary  over  its  entire  range  of  possible  values. 
The  result  is  an  n  dimensional  error  surface  for  the  given  set  of  training  exemplars, 
where  n  is  the  total  number  of  weights  in  the  network.  It  is  the  global  minimum 
on  this  surface  that  the  back  propagation  algorithm  seeks  to  find:  in  other  words, 
the  specific  combination  of  the  n  weights  that  yields  the  lowest  total  error.  For 
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further  information  on  this  concept,  Tarr  [11]  provides  a  very  good  discussion  and 
demonstration  of  the  two  dimensional  case. 

As  will  be  seen  in  the  next  section,  the  training  history  of  each  performance 
metric  mentioned  will  be  examined  for  sets  of  networks  trained  under  various  con¬ 
ditions.  However,  the  bulk  of  the  experiments  examine  the  effect  of  the  training 
conditions  on  only  one  performance  measure  after  completion  of  training.  The  per¬ 
formance  metric  used  is  the  percent  of  good  classifications  on  test  data  set  exemplars. 
Furthermore,  in  this  thesis,  this  metric  will  be  viewed  in  probabilistic  terms.  For 
example,  assume  a  certain  trained  network  is  tested  with  a  data  set  having  50  class 
1  exemplars  and  50  class  2  exemplars.  If  the  network  yields  a  good  classification  for 
80  of  the  100  test  exemplars,  this  will  be  expressed  as  P(good)  =  0.80.  Also,  if  the 
network  yields  good  classification  on  35  of  the  50  class  1  exemplars  and  45  of  the  50 
class  2  exemplars  in  the  test  data  set,  then  these  observations  will  be  expressed  as 
the  conditional  probabilities,  P(1  |  1)  =  0.70  and  P( 2  |  2)  =  0.90.  It  should  be  clear 
that  conditional  probabilities  for  incorrect  classif’-a+ion  P{2  J  1)  =  1  —  P(1  |  1) 
and  P(l  |  2)  =  1  —  P{ 2  |  2).  For  this  example,  the  proportions  of  the  class  1 
and  class  2  exemplars  in  the  test  set  can  be  expressed  as  the  a  priori  probabilities 
P(l)  =  P{ 2)  =  0.50.  The  characterization  of  network  performance  in  this  manner 
is  similar  to  the  way  information  channels  are  characterized  in  the  communications 
field.  The  conditional  probabilities  are  known  as  channel  transition  probabilities  and 
collectively  referred  to  as  the  transition  probability  matrix  or  P  matrix.  Networks 
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trained  for  a  two  class  recognition  problem  are  modeled  as  a  binary  channel.  For 
further  information  regarding  channel  models,  the  text  by  Hamming  [12]  is  an  excel¬ 
lent  choice.  Characterization  of  network  performance  with  this  model  was  inspired 
by  the  confusion  matrix  found  in  the  work  of  Piazza  [9].  These  confusion  matrices 
were  simply  counts  of  how  the  input  exemplars  of  each  class  were  distributed  over 
the  output  classifications  by  the  network.  If  the  number  of  test  exemplars  is  suffi¬ 
ciently  large,  it  is  not  hard  to  extend  the  confusion  matrix  concept  to  the  P_  matrix 
described  above. 

Finally,  a  few  words  regarding  the  naming  conventions  used  in  the  rest  of  this 
thesis  are  in  order.  Since  there  are  random  processes  involved  in  training  networks, 
the  value  of  the  P(good)  is  a  random  variable  with  some  probability  density  function 
(PDF),  and  will  have  some  distribution  about  a  mean.  Examination  of  the  effects 
of  different  training  conditions  or  test  conditions  must  be  done  by  examining  the 
the  differences  in  the  distributions  of  P(good).  A  sample  of  these  distributions, 
or  PDFs,  will  be  obtained  by  training  a  number  of  networks  for  a  given  set  of 
conditions  and  observing  the  outcomes.  In  the  next  section,  a  total  of  five  sets  of 
training  conditions  will  be  specified.  In  addition  to  this,  there  will  be  three  different 
methods  for  generating  distributions  of  interest.  Specifically,  the  P(1  |  1),  PK 2  |  2), 
and  P(good)  distributions  for  (1)  observed  outcomes  of  individual  networks,  (2) 
outcomes  of  majority  vote  networks  constructed  from  observed  outcomes  of  three 
individual  networks  as  shown  in  Figure  3.2,  and  (3)  the  calculated  outcomes  of 
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majority  vote  nets  constructed  using  the  probability  matrices  of  three  individual 
nets.  As  one  can  see,  quite  a  number  of  different  PDFs  will  need  to  be  named  and 
referred  to.  The  following  shorthand  naming  conventions  shall  be  used.  The  letter  R 
followed  by  a  single  digit  will  specify  a  set  of  training  conditions  for  a  training  run. 
The  letter  5  will  indicate  the  distribution  is  for  single  nets.  The  letter  M  will  signify 
the  distribution  is  for  majority  vote  nets.  The  appearance  of  Pll,  P22,  or  PG  will 
designate  the  distributions  as  P(1  |  l),  P( 2  j  2),  or  P(good),  respectively.  A  letter  C 
prepended  to  the  name  indicates  a  calculated  distribution.  As  an  example,  the  string 
R1SP11  refers  to  the  P(1  |  1)  distribution  for  the  single  nets  of  Run  1.  Similarly, 
the  string  CR5MPG  refers  to  the  calculated  P(good)  distribution  for  majority  vote 
networks  of  Run  5. 
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3.3  Experiment  Design. 


All  of  the  experiments  defined  in  this  section  are  designed  to  examine  the 
influence  that  various  conditions  have  on  the  good  classification  performance  metric. 
For  each  experiment,  the  purpose  or  intent,  network  training  and  testing  conditions, 
and  expected  results  will  be  specified.  Additionally,  the  data  requirements  and 
distribution  nomenclatures  will  also  be  given. 

3.3.1  Training  Performance.  While  not  an  actual  experiment  in  the  true 
sense  of  the  word,  the  documentation  of  how  the  networks  train  to  their  final  states 
is  of  general  interest.  It  can  guide  future  research  using  the  same  data  sets  by 
documenting  a  baseline  performance  for  comparisons.  The  raw  data  required  for 
documenting  the  training  history  of  the  performance  metrics  were  the  history  files  of 
nets  trained  in  the  runs  to  be  specified  in  the  following  sections.  Each  performance 
metric  will  be  averaged  over  30  nets  trained  in  a  given  run,  at  1,000  iteration  inter¬ 
vals.  These  averages  will  be  plotted  against  iterations  to  yield  a  training  performance 
curve. 


3.3.2  Characterization  of  ANN  Performance  with  the  E  Matrix  Model.  In 
this  experiment,  the  validity  of  using  a  E  matrix  to  characterize  the  performance 
of  trained  networks  will  be  tested.  Specifically,  the  stationarity  of  the  matrix  over 
different  test  data  sets  for  networks  trained  in  the  same  manner,  will  be  examined. 
Thirty  networks  were  trained  using  the  same  102  training  exemplars.  The  initial 
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weights  at  the  start  of  training  and  the  presentation  order  of  training  exemplars 
were  different  for  each  net  trained.  These  networks  were  tested  with  a  baseline  data 
set  having  50  class  1  exemplars  and  50  class  2  exemplars.  The  same  networks  were 
also  tested  with  data  sets  having  a  40/60%  mix  of  class  1  to  class  2  exemplars.  Thirty 
of  these  data  sets  were  constructed  from  the  baseline  test  set  by  randomly  removing 
12  class  1  exemplars  and  adding  seven  class  2  exemplars.  The  same  seven  class  2 
exemplars  were  added  to  each  test  set.  The  results  of  testing  the  30  nets  with  the 
baseline  test  set  will  be  called  Run  1,  while  the  results  of  testing  the  30  nets  with 
data  sets  having  the  40/60%  exemplar  mix  will  be  called  Run  la.  Note  that  the 
S  designation  for  single  nets,  has  been  omitted  in  this  case  since  there  will  be  no 
majority  vote  nets  constructed  for  these  runs. 

A  total  of  seven  distributions  were  needed  to  perform  the  experiment.  The 
distributions  for  the  conditional  probabilities,  P(  1  |  1)  and  P( 2  |  2),  and  the  joint 
probability,  P(good),  were  generated  for  each  run.  In  addition,  a  calculated  P(good) 
distribution  was  generated  by  using  the  P  matrices  of  Run  1  and  assuming  the  a 
priori  probabilities  P(l)  =  0.40  and  P(2)  =  0.60.  The  nomenclatures  for  these 
distributions  are;  RlPll,  R1P22,  R1PG,  CRlPG,  R1AP11,  R1AP22,  and  R1APG. 

If  the  P  matrices  are  stationary  with  respect  to  test  data  sets,  then  the  following 
results  should  be  expected.  There  should  be  no  difference  between  the  P(1  |  1)  and 
P(2  |  2)  distributions  of  either  run,  nor  should  there  be  any  difference  between  the 
calculated  P(good)  distribution  of  Run  1  and  the  P(good)  of  Run  la.  The  difference 
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between  the  P{good)  PDFs  of  Run  1  and  Run  la  should  be  due  only  to  the  change 
in  the  a  priori  probabilities  of  the  exemplar  classes  in  the  test  data  sets. 

3.3.3  Controlling  P  Matrix  Symmetry.  From  the  preliminary  training  runs, 
it  was  known  that  the  conditional  matrices  for  nets  trained  with  a  50/50%  mix  of 
exemplar  classes  were  asymmetric.  The  good  classification  of  class  1  exemplars  was 
much  poorer  than  for  class  2  exemplars.  If  this  P  matrix  were  for  a  communications 
channel,  adjustments  would  be  made  so  that  the  channel  was  symmetric  [12:143]. 
Assuming  it  would  be  desirable  to  have  a  neural  network  with  a  symmetrical  P 
matrix,  a  question  arises:  Is  it  possible  to  cause  the  P  matrix  to  move  toward 
symmetry?  This  experiment  is  designed  to  answer  this  question. 

The  P  matrix  PDFs  of  Run  1  will  be  used  as  the  baseline  case  of  trained 
network  responses.  Another  run  of  nets,  which  chall  be  referred  to  as  Run  2S,  were 
trained  with  a  60/40%  mix  of  class  1  to  class  2  exemplars.  The  training  sets  were 
constructed  by  randomly  removing  17  class  2  exemplars  from  the  baseline  training  set 
of  Run  1.  A  total  of  30  of  these  training  sets  were  constructed  in  this  manner,  each 
one  slightly  different  with  regard  to  the  exact  set  of  class  two  exemplars  included. 
The  test  set  exemplars  for  all  of  these  nets  were  identical  to  the  test  set  used  for 
Run  1.  The  three  new  PDFs  of  interest  are  the  P(1  |  1),  P( 2  |  2),  and  P(good)  of 
Run  2S  which  will  have  the  nomenclatures  R2SP11,  R2SP22,  and  R2SPG. 

If  the  networks  are  trained  harder  on  class  1  exemplars,  as  in  Run  2S,  then  the 
weight  space  solutions  found  by  the  nets  should  shift  to  be  more  favorable  to  class 
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1  recognition.  In  terms  of  the  PDFs,  the  mean  value  of  the  P{  1  |  1)  PDF  in  Run  2 
should  be  greater  than  that  of  Run  1,  and  the  mean  of  the  P{ 2  |  2)  PDF  of  Run  2 
should  be  less  than  that  of  Run  1.  It  is  hoped  that  the  means  of  the  two  P(good) 
distributions  are  essentially  the  same. 

3.3.4  Improvement  of  Classification  Performance  via  Majority  Vote  Rule. 
This  experiment  is  designed  to  determine  if  classification  performance  can  be  im¬ 
proved  by  using  a  majority  vote  decision  rule  over  three  separately  trained  nets.  In 
addition,  the  conditions  of  training  necessary  to  achieve  this  improvement  will  be 
explored.  The  idea  proposed  here  is  very  similar  to  the  use  of  redundancy  in  com¬ 
munications  systems  to  reduce  probability  of  symbol  or  bit  error.  Basically,  for  a 
channel  having  a  specified  probability  of  bit  error,  two  redundant  bits  are  sent  for 
every  information  bit.  If  the  cause  of  corruption  (noise)  is  independent  and  uncorre¬ 
lated  during  each  successive  bit  transmission,  then  the  joint  probability  that  two  or 
all  three  bits  are  in  error  will  be  much  less  than  that  of  any  single  bit.  However,  in 
order  to  realize  the  improvement,  there  must  be  independence  from  one  trial  to  the 
next.  It  is  obvious  that  the  solutions  found  by  the  successively  trained  networks  are 
not  totally  independent;  the  back  propagation  algorithm  is  seeking  the  same  global 
minimum  on  the  same  error  surface  on  each  trial.  However,  one  cannot  say  that  the 
solutions  are  totally  dependent  either,  since  it  has  been  observed  that  the  nets  do 
not  end  up  in  exactly  the  same  place  in  weight  space  [9:page  4-9). 
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Four  runs  of  nets  trained  under  different  conditions  were  required  for  this 
experiment.  The  test  data  sets  for  these  four  runs  were  identical.  Recall  that  Run  2S 
nets  were  trained  such  that  initial  weights,  presentation  order  of  training  exemplars, 
and  exact  composition  of  the  training  data  set  were  varied  from  one  net  to  the 
next.  An  additional  90  nets  were  trained  in  this  manner.  These  nets  were  then 
used  three  at  a  time  to  construct  30  majority  vote  nets  as  shown  in  Figure  3.2. 
The  P  matrices  for  these  nets  were  generated  by  observing  the  decisions  of  each  of 
the  single  nets  for  the  test  set  exemplars  and  determining  the  final  classification  by 
majority  vote.  At  the  same  time,  a  calculated  majority  vote  matrix  was  generated 
by  assuming  independence  and  using  the  P  matrices  of  the  the  90  single  nets  three 
at  a  time.  The  actual  comparisons  will  only  use  the  P(good)  PDF  of  each  of  the 
matrices.  This  same  basic  procedure  was  repeated  for  runs  3,  4,  and  5:  120  nets  were 
trained,  30  nets  were  used  to  construct  the  single  net  PDFs,  the  other  90  to  construct 
the  majority  vote  and  calculated  majority  vote  PDFs.  The  nets  of  Run  3  were  all 
trained  on  the  same  data  set,  but  the  initial  weights  and  exemplar  presentation 
orders  were  different  for  every  net.  In  Runs  4  and  5,  the  training  data  sets  were 
the  same  as  Run  3,  but  in  Run  4,  only  the  initial  weights  varied,  while  in  Run  5, 
only  the  presentation  order  varied.  The  nomenclatures  for  the  PDFs  to  be  compared 
are:  R2SPG,  R2MPG,  CR2MPG,  R3SPG,  R3MPG,  CR3MPG,  R4SPG,  R4MPG, 
CR4MPG,  R5SPG,  R5MPG,  and  CR5MPG. 
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If  there  is  a  sufficient  degree  of  independence  in  the  outcomes  of  the  single  nets 
of  any  of  the  runs,  the  result  should  be  that  the  average  of  the  majority  vote  P(good) 
PDF  is  greater  than  that  of  the  single  nets.  This  gain  in  average  performance  should 
be  some  portion  of  the  gain  for  the  calculated  majority  vote  nets.  Also,  comparisons 
between  the  majority  vote  gains  of  the  different  runs  should  provide  some  insight 
into  the  relative  degree  each  controlled  condition  contributes  to  the  randomness  of 
the  weight  space  solutions  found  by  the  nets. 

3.3.5  Influence  of  Training  Data  Sets,  Initial  Weights,  and  Exemplar  Presen¬ 
tation  Order  on  Network  Solutions.  This  experiment  uses  the  data  obtained  while 
training  the  nets  of  runs  2  through  5.  The  intent  of  this  experiment  is  to  discover 
the  degree  of  similarity  of  decision  regions  formed  by  nets  in  a  given  run. 

For  each  network  trained  in  the  previous  experiment,  a  list  of  the  exact  exem¬ 
plars  incorrectly  classified  by  that  net  w as  generated.  The  lists  for  the  first  30  nets 
trained  in  each  run  will  be  compiled  into  a  master  list  for  each  run.  The  list  mill 
contain  the  file  name  of  all  the  exemplars  incorrectly  classified  by  any  of  the  30  nets 
in  a  given  run.  In  addition,  the  exact  number  of  nets  in  the  run  that  incorrectly 
classified  each  exemplar  on  the  list  will  be  recorded. 

The  expected  results  in  this  case  can  only  be  stated  in  general  terms.  If  the 
decision  regions  formed  by  the  nets  of  a  given  run  are  all  relatively  similar,  then  it 
should  be  found  that  a  majority  of  the  test  exemplars  on  the  list  were  incorrectly 
classified  by  most,  if  not  all,  of  the  networks  trained  for  that  run.  In  other  words,  if 
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two  nets  find  exactly  the  same  solution,  they  should  make  exactly  the  same  mistakes. 
On  the  other  hand,  if  two  nets  find  equally  good,  but  very  different  decision  regions, 
then  even  though  each  net  makes  the  same  number  of  mistakes,  the  mistakes  made 
by  one  may  be  quite  different  than  those  made  by  the  other. 

3.3.6  Summary.  The  previous  subsections  specified  the  purpose,  training  and 
testing  parameters,  and  expected  results  of  each  experiment.  The  training  runs  were 
set  up  to  train  the  networks  and  obtain  the  raw  data.  The  raw  data  was  then 
used  to  generate  the  different  distributions  required  for  the  various  experiments. 
Each  training  run  had  a  unique  set  of  training  and  testing  parameters.  Table  3.1 
summarizes  these  parameters  for  all  of  the  training  runs  described  in  the  preceeding 
paragraphs. 

3.4  Analytical  Methods. 

The  quantities  being  dealt  with  in  this  thesis  are  samples  of  populations.  In 
some  cases,  the  samples  may  in  fact  be  from  the  same  population  and  in  other  cases, 
from  entirely  different  populations.  By  analysis  and  comparison,  we  hope  to  discover 
which  of  these  statements  apply  to  the  sample  distributions  being  compared.  The 
experiments  outlined  in  the  previous  section  were  designed  with  this  goal  in  mind. 
The  determination  of  whether  the  PDFs  being  compared  come  from  the  same  or 
from  different  populations  will  be  the  results  from  which  the  conclusions  will  be 
drawn.  The  method  chosen  to  do  this  is  hypothesis  testing.  The  PDFs  were  tested 
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Table  3.1.  Summary  of  Experiment  Run  Parameters 


Parameter 


#  of  Nets 
Trained 


#  of  Training 
Exemplars 


Training 
Set  Mix 

class  1 /class  2  { 


Identical 
Training  Sets 


#  of  Test 
Exemplars 


Test  Set  Mix 
class  1 /class  2 


Identical 
Test  Sets 


Generator  Seed 
for  Initial 
Weights 


Generator  Seed 
for  Presentation 
Order 


Majority  Vote 
Networks 


Distribution 

Nomenclatures 


Run  Designation 


Rl  RlA 


50/50  % 


Yes  No 


No 


R 
R 
R 

CR1PG 


R2 

R3 

R4 

R5 

120 

120 

120 

120 

85 

102 

102 

102 

60/40  % 

50/50  % 

50/50  % 

50/50  % 

No 

Yes 

Yes 

Yes 

100 

100 

100 

100 

50/50  % 

50/50  % 

50/50  % 

50/50  % 

Yes 

Yes 

Yes 

Yes 

Variable 

Variable 

Variable 

Fixed 

Variable 

Variable 

Fixed 

Variable 

Yes 

Yes 

Yes 

Yes 

R2SP11 

R3SP11 

R4SP11 

R5SP11 

R2SP22 

R3SP22 

R4SP22 

R5SP22 

R2SPG 

R3SPG 

R4SPG 

R5SPG 

R2MP11 

R3MP11 

R4MP11 

R5MP11 

R2MP22 

R3MP22 

R4MP22 

R5MP22 

R2MPG 

R3MPG 

R4MPG 

R5MPG 

CR2MP11 

CR3MP11 

CR4MP11 

CR5MP11 

CR2MP22 

CR3MP22 

CR4MP22 

CR5MP22 

CR2MPG 

CR3MPG 

CR4MPG 

CR5MPG 

*  This  run  tested  nets  produced  in  Rl  with  different  test  sets 


using  a  commercial  software  package,  Statist ixtm ,  from  NH  Analytical  Software  [13], 
running  on  an  IBM  compatible  personal  computer.  For  all  comparisons  equality  is 
the  standard  null  hypothesis.  Alternative  hypotheses  may  be  inequality,  less  than, 
or  greater  than,  as  it  fits  the  particular  comparison.  Additionally,  the  standard  level 
of  significance  for  all  tests  will  be  a  =  0.05.  The  software  package  provides  the 
p-value  for  all  tests.  A  />- value  is  the  probability  that  the  observed  difference  in  the 
samples  could  have  occurred  by  random  chance.  Thus,  a  very  low  p- value  indicates 
that  the  samples  are  not  from  the  same  population.  If  the  p-value  falls  below  the  a 
value,  the  null  hypothesis  is  rejected,  and  one  may  accept  the  applicable  alternative 
hypothesis. 

There  are  many  test  statistics  available  for  hypothesis  testing.  The  parametric 
tests  are,  in  general,  more  powerful,  but  require  that  the  samples  come  from  a  normal 
population.  Non-parametric  tests  only  require  that  the  observations  be  independent. 
As  a  standard  procedure,  the  results  will  include  a  test  for  normality  using  the  Wilk- 
Shapiro  test  statistic  [14].  A  table  of  the  percentage  points  for  this  test  statistic  has 
been  provided  in  Appendix  C.  Whenever  the  assumption  of  normality  holds,  a 
parametric  Two-Sample  t-test  will  be  performed,  otherwise,  non-parametric  tests 
will  be  used.  The  non-parametric  tests  available  are;  the  Wilcoxon  Signed  Rank 
Test,  Rank  Sum  Test,  Kruskal- Wallis  One  Way  Analysis  of  Variance  (AOV),  and 
the  Median  Test.  For  further  information  on  hypothesis  testing  or  test  statistics, 
refer  to  [13]  [15]  [16]. 
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IV.  Results 


4-1  Introduction. 

This  chapter  documents  the  major  results  of  this  thesis  effort.  First  the  average 
training  performance  histories  are  presented.  These  performance  curves  show  how 
the  networks  reached  their  final  states.  The  final  states  are  more  closely  analyzed 
in  the  remaining  sections  covering  comparisons  of  the  various  output  probability 
density  functions  (PDF)  of  populations  obtained  from  net  responses  to  test  set  data. 

4-2  Training  Performance. 

There  are  several  metrics  used  to  evaluate  the  training  performance  of  neural 
networks:  the  error  over  the  training  data,  percent  of  right  and  good  classifications 
over  the  training  data,  percent  of  right  and  good  classifications  over  the  test  data. 
These  metrics  are  usually  used  to  judge  the  worth  of  different  training  algorithms 
or  data  sets.  All  of  these  metrics  were  obtained  for  30  nets  trained  in  a  given  run. 
The  training  histories  for  Run  1  were  not  recorded  because  these  nets  were  trained 
under  the  same  conditions  as  the  nets  of  Run  3.  The  histories  for  runs  4  and  5  are 
not  shown  due  to  the  fact  that  they  parallel  the  history  for  Run  3  so  closely,  that 
it  becomes  difficult  to  distinguish  one  from  the  other  on  the  plots.  The  following 
sections  discuss  the  training  history  of  each  metric  for  Run  2  and  Run  3. 
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Figure  4.1.  Training  Histories  for  Average  Total  Error 

4-2.1  Average  Error  History.  The  average  error  histories  over  30  nets  trained 
in  Run  2  and  Run  3  are  shown  in  Figure  4.1.  The  figures  show  that  by  12,000 
iterations  the  error  was  well  past  the  knee  on  both  curves  and  appears  to  be  asymp¬ 
totically  approaching  zero.  Note  that  the  curve  for  Run  2  converges  to  zero  slightly 
faster  than  Run  3.  No  analysis  was  performed  to  determine  if  this  difference  was 
significant. 

4-2.2  Average  Right  Classification  Histories  on  Training  Data.  The  histories 
of  average  percent  right  classification  on  training  data  for  Run  2  and  3  are  shown  in 
Figure  4.2  .  After  1,000  iterations,  the  nets  of  both  runs  average  close  to  10%  and 
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Figure  4.2.  Training  Histories  of  Right  Classification  on  Training  Data 

by  10,000  iterations  are  at  or  near  100%.  At  10,000  iterations,  both  runs  were  well 
past  the  knee  of  their  respective  curves. 

4-2.3  Average  Good  Classification  Histories  on  Training  Data.  The  histories 
of  average  percent  of  good  classification  on  training  data  for  Run  2  and  3  are  shown 
in  Figure  4.3.  After  1,000  iterations,  the  nets  of  Run  2  averaged  close  to  60%,  while 
Run  3  nets  averaged  only  slightly  better  than  50%  .  By  10,000  iterations,  both  runs 
were  at  or  near  100%  classification  and  well  past  the  knee  of  their  respective  curves. 
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4-2.4  Average  Right  Classification  Histories  on  Test  Data.  Figures  4.4  and 
4.5  show  the  histories  of  percent  right  classification  on  test  data  for  Run  2  and 
Run  3,  respectively.  Since  the  performance  on  test  data  is  of  particular  interest 
in  this  thesis,  the  plots  show  additional  information  about  the  distributions.  Each 
figure  shows  plots  of  the  average  values,  the  average  plus  one  standard  deviation, 
and  the  average  minus  one  standard  deviation.  After  1,000  iterations,  the  nef:  of 
Run  2  show  20%  classification  accuracy,  while  Run  3  nets  were  closer  to  35%.  Both 
curves  are  well  past  the  knee  at  10,000  iterations.  At  20,000  iterations,  the  average 
for  Run  3  was  slightly  better  than  for  Run  2.  Finally,  the  variance  in  both  runs 
appears  constant  after  10,000  iterations  and  the  variance  of  Run  2  is  significantly 
greater  than  Run  3.  At  20,000  iterations,  the  standard  deviation  was  3.597  for  Run 

2  and  1.892  for  Run  3. 

4-2.5  Average  Good  Classification  Histories  on  Test  Data.  Figures  4.6  and 
4.7  show  the  histories  of  percent  good  classification  on  test  data  for  Run  2  and  Run 
3,  respectively.  As  before,  each  figure  has  plots  of  the  average  value,  the  average  plus 
one  standard  deviation,  and  the  average  minus  one  standard  deviation.  After  1,000 
iterations,  the  nets  of  Run  2  show  just  below  60%  classification  accuracy,  while  Run 

3  nets  were  closer  to  70%.  For  both  curves  a  point  at  10,000  iterations  was  well  past 
the  knee.  Again,  after  20,000  iterations,  the  average  for  Run  3  was  slightly  better 
than  for  Run  2.  As  before,  the  variance  in  both  runs  appears  constant  after  10,000 
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Figure  4.4.  Training  Histories  of  Right  Classification  on  Test  Data  for  Run  2 
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Figure  4.5.  Training  Histories  of  Right  Classification  on  Test  Data  for  Run  3 
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Figure  4.6.  Training  Histories  of  Good  Classification  on  Test  Data  for  Run  2 
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Figure  4.7.  Training  Histories  of  Good  Classification  on  Test  Data  for  Run  3 
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iterations  and  the  variance  of  Run  2  was  significantly  greater  than  Run  3.  At  20,000 
iterations,  the  standard  deviation  was  3.739  for  Run  2  and  1.604  for  Run  3. 

4-2.6  Summary  of  Training  Performance.  In  general,  by  10,000  iterations, 
the  nets  of  both  runs  are  asymptotically  approaching  their  best  performance.  Con¬ 
tinued  training  provided  very  little  gain  in  any  of  the  performance  metrics.  It  is 
curious  to  note  that  even  though  Run  2  had  better  error  performance  than  Run  3, 
it  consistently  had  lower  average  classification  performance  on  the  test  data.  The 
algorithm  for  back  propagation  of  error  works  by  minimizing  the  error  between  the 
actual  output  and  the  desired  output.  Thus,  one  expects  to  see  that  nets  with  the 
smallest  error  yield  the  best  classification  performance.  It  was  also  observed  that 
Run  2  networks  exhibited  greater  variance  of  good  classification  on  test  data  than 
did  Run  3  nets.  Finally,  note  that  in  both  runs,  the  variance  of  good  classification 
on  test  data  was  quite  constant  after  10,000  iterations  and  further  training  did  not 
appear  to  reduce  it. 

4-3  Characterization  of  ANN  Performance  with  the  P_  Matrix  Model. 

This  section  presents  the  results  of  comparing  the  PDFs  of  Run  1  and  Run 
1A.  Specifically,  the  conditional  PDFs,  P(1  |  1)  and  P( 2  |  2),  and  the  joint  PDFs, 
P(good)  of  both  runs,  will  be  compared.  Additionally,  comparisons  between  the 
calculated  P(good)  PDF  of  Run  1  and  the  observed  P(good)  of  Run  1 A  will  be  made. 
Recall  that  the  calculated  PDF  was  generated  by  using  the  observed  conditional 


4-8 


Table  4.1.  Summary  Statistics  for  Distributions  of  Run  1  and  Run  1A 


Run 

ID 

Metric 

P(  1  1  1) 

P(  2  1  2) 

P(good) 

Observed 

Calculated 

Rl 

Mean 

0.9213 

mmm 

STD 

0.0185 

MSB! 

RlA 

Mean 

0.9304 

mmm 

NA 

STD 

0.0178 

BUI 

NA 

Difference 

RlA  -  Rl 

RlA  -  CRl 

of  Means 

-0.0117 

0.0091 

0.019! 

.0008 

PDFs  of  the  nets  of  Run  1  and  assuming  a  priori  probabilities  of  the  test  set  exemplar 
classes  to  be  P(l)  =  0.40  and  P( 2)  =  0.60. 

Table  4.1  shows  the  average  and  standard  deviation  of  the  PDFs  of  interest. 
Additionally,  the  difference  in  the  average  values  are  shown  in  the  last  row  of  the 
table.  The  summary  statistics  are  excerpts  from  Tables  A.l  and  A. 2  located  in 
Appendix  A.  The  table  shows  that  the  largest  difference  is  between  the  observed 
joint  PDFs,  P(good),  of  Run  1 A  and  Run  1,  while  the  smallest  difference  was  between 
the  observed  P(good)  PDF  of  Run  1A  and  the  calculated  P(good)  PDF  of  Run  1. 

The  results  of  testing  each  PDF  against  the  null  hypothesis  that  they  are 
normally  distributed  are  shown  in  Table  4.2.  It  can  be  seen  from  the  table  that,  at 
a  =  0.05,  the  null  hypothesis  must  be  rejected  for  all  but  two  of  the  distributions. 
Based  on  these  results,  we  are  restricted  to  non-parametric  tests  of  significance. 

Table  4.3  shows  the  results  of  testing  the  null  hypothesis  that  the  sample  distri¬ 
butions  being  compared  are  actually  samples  from  the  same  population.  Note  that 
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Table  4.2.  Results  of  Test  for  Normality  for  Run  1  and  Run  la  Distributions 


H0  :  Samples  are  fron 
Criterion  :  Reject  nul 

l  a  normal  distribution 
if  WS  <  0.927,  otherwise,  Fail  to  reject 

Distribution 

RlPll 

R1P22 

R1PG 

R1AP11 

R1A22 

R1APG 

CR1APG 

Wilk-Shapiro 

Statistic 

.9140 

.8773 

.9000 

.9093 

.8562 

.9588 

.9367 

Decision 

R 

R 

R 

R 

R 

F 

F 

Table  4.3.  Results  of  Hypothesis  Tests  Between  Run  1  and  Run  la  Distributions 


Ho  :  Samples  are  from  the  same  distribution 
Criterion  :  Reject  null  if  p  <  0.05,  otherwise, 

Fail  to  reject 

Test 

Wilcoxon 

Rank 

Kruskal- Wallis 

Median 

Signed  Rank 

Sum 

One  Way  AOV 

Test 

Comparison 

P 

F/R 

p  F/R 

P 

F/R 

P  F/R 

RlPll  -  RlAPll 

.1156 

F 

.4643  F 

.4577 

F 

.2274  F 

R1P22  -  R1AP22 

.0000 

R 

.0215  R 

.0186 

R 

.0503  F 

RlPG  -  R1APG 

.0000 

R 

.0001  R 

.0001 

R 

.0001  R 

CR1PG  -  R1APG 

.6343 

F 

.7731  F 

.7668 

F 

.8172  F 

there  is  a  failure  to  reject  the  hypothesis  for  the  comparisons  between  the  two  P(  1  |  1) 
PDFs  and  between  the  calculated  P(good)  of  Run  1  and  the  observed  P(good)  of 
Run  1A.  There  is  also  a  unanimous  agreement  between  the  tests  to  strongly  reject 
the  hypothesis  for  Run  1A  and  Run  1  P(good)  distributions.  Unfortunately,  there  is 
a  conflict  in  the  decision  about  the  comparison  of  the  P( 2  |  2)  PDFs  of  the  two  runs. 
Three  of  the  four  tests  dictate  a  rejection  of  the  null  hypothesis.  It  was  expected 
that  there  would  be  a  failure  to  reject  the  null  hypothesis  in  this  comparison. 
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The  results  presented  in  this  section  show  no  distinguishable  difference  be¬ 
tween  the  conditional  P(  1  |  1)  distributions  of  the  two  runs  or  the  observed  joint 
distribution  of  Run  1A  and  the  calculated  joint  distribution  of  Run  1.  The  tests 
show  strong  evidence  that  the  two  P( 2  j  2)  PDFs  are  from  different  populations, 
which  is  exactly  opposite  of  what  was  expected,  even  though  these  PDFs  had  one  of 
the  smallest  differences  between  their  averages.  However,  the  sum  of  their  standard 
deviations  was  also  the  smallest.  When  the  variances  of  two  samples  are  small,  then 
minute  differences  in  their  means  can  be  detected  by  statistical  tests. 

4-4  Controlling  £  Matrix  Symmetry. 

In  this  section,  the  PDFs  of  Run  1  and  Run  2S  will  be  compared.  The  summary 
statistics  and  results  of  hypothesis  testing  will  be  shown.  Both  runs  were  tested  with 
identical  test  data  sets  having  a  50/50%  mix  of  class  1  and  class  2  exemplars,  but 
the  training  data  sets  for  the  two  runs  were  different.  The  nets  of  Run  1  were  trained 
with  a  50/50%  mix  of  exemplars  and  the  nets  of  Run  2S  were  trained  on  a  60/40% 
mix  of  class  1  to  class  2  exemplars. 

In  Table  4.4,  the  averages  and  standard  deviations  of  the  PDFs  of  Run  1  and 
Run  2S  are  shown.  These  figures  are  excerpts  from  tables  A.l  and  A. 3  located  in 
Appendix  A.  The  last  row  of  the  table  shows  the  difference  in  the  average  values 
of  the  like  PDFs  of  each  run.  In  all  three  instances,  the  differences  are  fairly  large. 
The  reader  should  note  that  the  £  matrix  of  Run  1  is  skewed  in  favor  of  the  P{ 2  |  2) 
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Table  4.4.  Summary  Statistics  for  Distributions  of  Run  1  and  Run  2S 


Run 

ID 

Metric 

P(l|l) 

P(  2  1  2) 

P(good) 

R1 

Mean 

STD 

m 

EH 

R2S 

Mean 

STD 

m 

— 

0.8043 

0.0374 

Difference 
of  Means 

R2S  -  R1 

0.0387  -0.0893  -0.0254 

Table  4.5.  Results  of  Test  for  Normality  for  Run  1  and  Run  2S  Distributions 


Ho  :  Samples  are  from  a  normal  distribution 

Criterion  :  Reject  null  if  WS  <  0.927,  otherwise,  Fail  to  reject 

Distribution 

RlPll 

R1P22 

RlPG 

R2SP11 

R2S22 

R2SPG 

Wilke-Shapiro 

Statistic 

.9140 

.8773 

.9000 

.9455 

.9118 

.9777 

Decision 

R 

R 

R 

F 

R 

F 

PDF  while  in  Run  2S  the  averages  of  P(1  |  1)  and  P(2  |  2)  PDFs  are  closer  together 
and  the  average  matrix  is  almost  symmetric. 


The  results  of  testing  each  of  the  PDFs  of  interest  against  the  null  hypothesis 
that  they  are  samples  from  a  normal  distribution  are  shown  in  Table  4.5.  For  a  = 
0.05,  the  null  hypothesis  is  rejected  for  all  but  two  of  the  distributions.  Since  we 
will  compare  the  like  PDFs  of  the  two  runs,  we  restrict  ourselves  to  non-parametric 
hypothesis  tests.  Table  4.6  shows  the  results  of  the  these  non-parametric  tests.  The 
null  hypothesis  assumes  the  sample  distributions  being  compared  come  from  the 
same  population.  In  every  case,  there  is  unanimous  agreement  for  strong  rejection 
of  this  hypothesis. 
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Table  4.6.  Results  of  Hypothesis  Tests  Between  Run  1  and  Run  2S  Distributions 


Ho  :  Samples  are  from  the  same  distribution 

Criterion  :  Reject  null  if  p  <  0.05,  otherwise,  Fail  to  reject 

Test 

Wilcoxon 
Signed  Rank 

Rank 

Sum 

Kruskal- Wallis 
One  Way  AOV 

Median 

Test 

Comparison 

P  F/R 

P  F/R 

P  F/R 

P  F/R 

RlPll  -  R2SP11 
R1P22  -  R2SP22 
RlPG  -  R2SPG 

.0081  R 
.0000  R 
.0059  R 

.0089  R 
.0000  R 
.0023  R 

.0078  R 

.0000  R 

.0021  R 

HH 

In  summary,  the  results  of  this  section  show  that  all  of  the  PDFs  compared 
have  relatively  large  differences  in  their  averages.  In  addition,  at  a  confidence  level  of 
a  =  0.05,  the  differences  are  significant.  The  tests  indicate  that  it  is  highly  unlikely 
that  any  of  the  distributions  compared  actually  came  from  the  same  populations. 


4-5  Improvement  of  Classification  Performance  via  Majority  Vote  Rule. 

This  section  compares  the  joint  /’(good)  PDFs  of  runs  2  through  5.  Specifically, 
the  PDFs  within  each  run  for  the  single  nets,  the  observed  majority  vote  nets,  and 
the  calculated  majority  vote  nets  will  be  compared.  The  calculated  majority  vote 
PDFs  represent  the  expected  joint  PDF  of  the  outcomes  if  one  assumes  independence 
between  single  net  outputs.  The  averages  and  results  of  hypothesis  testing  for  these 
PDFs  will  be  shown.  All  runs  were  tested  with  identical  test  data  sets  having  a 
50/50%  mix  of  class  I  and  class  2  exemplars.  However,  the  training  data  sets  for 
Run  2  were  different  than  the  training  data  for  runs  3,  4,  and  5.  Additionally,  the 
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Table  4.7.  Summary  of  Means  for  P(  good  )  Distributions  of  Runs  2  through  5 


Run 

PDF  ID 

Differences 

S 

M 

CM 

M  -  S 

CM  -  S 

jgfj 

|ix: 

0.8223 

0.8957 

SWISS 

Br  ; '  ^ 

0.8200 

0.8998 

BBSS 

1 

R4 

■ll* 

0.8190 

0.8910 

I 

iMllB 

R5 

0.8083 

0.8863 

-0.0020 

0.0760 

reader  is  reminded  that  the  random  number  generator  seeds  for  runs  3,  4,  and  5  were 
controlled  parameters. 

In  Table  4.7,  the  averages  of  the  P(good)  PDFs  generated  from  the  runs  are 
shown.  All  of  the  figures  are  excerpts  from  tables  A. 3  through  A. 14  located  in 
Appendix  A.  The  first  column  indicates  the  run  to  which  the  row  applies.  For  the 
other  column  headers,  S  stands  for  single  nets,  M  for  observed  majority  vote,  and 
CM  for  calculated  majority  vote. 

In  all  cases,  the  differences  between  means  of  the  calculated  majority  vote  PDFs 
and  the  single  net  PDFs  was  substantial.  These  differences  should  be  viewed  as  the 
maximum  expected  improvement.  Thus,  if  we  calculate  (M  —  S/CM  —  5)  x  100  we 
have  a  measure  of  how  close  to  this  maximum  the  observed  values  were.  These  values 
are  approximately  19.7%,  7.0%,  1%,  and  —2.6%  for  runs  2  through  5,  respectively. 

The  results  of  testing  each  PDF  for  normality  are  shown  in  Table  4.8.  As  shown 
in  the  table  we  reject  the  null  hypothesis  for  only  the  majority  vote  constructs  of  runs 
4  and  5.  Thus,  we  restrict  ourselves  to  non-parametric  tests  for  these  runs.  However, 
using  parametric  tests  for  runs  2  and  3  is  justified.  Table  4.9  shows  the  results  of 
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Table  4.8.  Results  of  Test  for  Normality  for  Distributions  of  Runs  2  through  5 


H0  :  Samples  are  from  a  normal  distribution 

Criterion  :  Reject  null  if  WS  <  0.927,  otherwise,  Fail  to  reject 

Distribution 

R2SPG 

R2MPG 

CR2MPG 

Wilke-Shapiro 

Statistic 

.9777 

.9511 

.9662 

Decision 

F 

F 

F 

Distribution 

R3SPG 

R3MPG 

CR3MPG 

Wilke-Shapiro 

Statistic 

.9598 

.9352 

.9853 

Decision 

F 

F 

F 

Distribution 

R4SPG 

R4MPG 

CR4MPG 

Wilke-Shapiro 

Statistic 

.9755 

.9155 

.8787 

Decision 

F 

R 

R 

Distribution 

R5SPG 

R5MPG 

CR5MPG 

Wilke-Shapiro 

Statistic 

.9616 

.8744 

.9174 

Decision 

F 

R 

R 

the  tests  of  significance  tests  for  comparisons  between  the  PDFs  of  singles  nets  and 
majority  vote  nets  within  each  run.  The  null  hypothesis  for  all  tests  was  equality.  In 
every  case,  there  is  unanimous  agreement  for  strong  rejection  of  the  hypothesis  for 
the  comparison  between  Run  2  single  nets  and  Run  2  observed  majority  vote  nets. 
For  all  others  runs,  there  is  unanimous  agreement  to  fail  to  reject  the  hypothesis. 
Although  not  shown  in  the  table,  the  same  tests  were  performed  for  the  differences 
between  the  PDFs  of  single  nets  and  calculated  majority  vote  nets.  They  were  not 
included  in  the  Table  4.9  because  the  results  can  be  stated  rather  simply;  out  to 
four  decimal  places,  all  of  the  p  values  were  zero.  This  same  result  applies  to  a 
comparison  between  the  observed  and  calculated  majority  vote  PDFs. 


Table  4.9.  Results  of  Hypothesis  Tests  Between  PDFs  of  Single  and  Majority  Vote 
Nets  for  Run  2  through  Run  5 


Ho  :  Samples  are  from  the  same  distribution 

Criterion  :  Reject  null  if  p  <  0.05,  otherwise,  Fail  to  reject 


Test 

Wilcoxon 

Signed 

Rank 

Rank 

Sum 

Kruskal- 
Wallis  One 
Way  AOV 

Median 

Test 

Two 

Sample 

T  Test 

Comparison 

P 

F/R 

P 

F/R 

P 

F/R 

P 

F/R 

R2SPG  -  R2MPG 

.0490 

R 

.0315 

R 

.0297 

R 

.0175 

R 

R3SPG  -  R3MPG 

.1886 

F 

.2009 

F 

.1897 

F 

.1677 

F 

R4SPG  -  R4MPG 

.7897 

F 

.9058 

F 

.8983 

F 

.8841 

F 

NA  NA 

R5SPG  -  R5MPG 

.5531 

F 

.6361 

F 

.6207 

F 

.3659 

F 

NA  NA 

In  summary,  the  results  of  this  section  show  that  the  difference  between  the 
means  of  the  P(good)  distributions  for  single  nets  and  observed  majority  vote  nets  of 
Run  2  is  significant.  The  difference  amounts  to  approximately  19.7%  of  the  maximum 
expected  improvement.  Conversely,  the  observed  difference  in  these  PDFs  for  all 
other  runs  is  not  significant. 


4-6  Influence  of  Training  Data  Sets,  Initial  Weights,  and  Exemplar  Presentation 

Order  on  Network  Solutions. 

In  order  to  gain  insight  into  the  differences  between  the  networks  of  runs  2 
through  5,  we  present  here  an  analysis  of  which  test  exemplars  were  incorrectly 
classified  in  each  run.  Also,  the  number  of  nets  that  incorrectly  classified  each  of 
these  exemplars  is  shown. 

Table  A. 15  in  Appendix  A  contains  the  complete  list  of  incorrectly  classified 
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exemplars  along  with  a  count  of  how  many  nets  misclassified  each  exemplar.  There 
were  54  exemplars  on  the  list  for  Run  2,  34  for  Run  3,  and  30  for  Runs  4  and  5.  The 
maximum  number  of  incorrect  classifications  is  30,  since  there  were  30  nets  trained 
in  each  run.  The  lists  are  ordered  from  highest  to  lowest  number  of  error  counts. 
Since  the  exemplars  incorrectly  classified  by  most,  or  all,  30  nets  in  a  given  run  are 
of  particular  interest,  Table  4.10  shows  the  top  portion  of  Table  A. 15. 


Table  4.10.  Partial  List  of  Incorrectly  Classified  Test  Exemplars  in  Run  2  Through 
Run  5 


Run  2S 

Run  3S 

Run  4S 

Run  5S 

File 

Count 

File 

Count 

File 

Count 

File 

Count 

30 

corrl7 

30 

30 

ESB|i5S|| 

msm 

29 

corr25 

30 

30 

■a 

corrl94 

28 

corr37 

30 

30 

corr96 

30 

corr53 

27 

corr96 

30 

corr41 

30 

corr51 

30 

corr227 

27 

corr227 

30 

corr96 

30 

corrl64 

30 

corrl64 

26 

cor r 194 

30 

corrl94 

30 

corrl82 

30 

corr7 

24 

corr53 

29 

corrl68 

29 

corrl94 

30 

Note  that  Table  4.10  shows  that  several  exemplars  appear  in  all  four  lists; 
corr96,  corr25,  corrl94.  Five  of  the  exemplars  appear  in  two  or  more  lists  and  four 
appear  in  only  one  list.  One  might  suspect  that  these  exemplars  were  abnormal  or 
perhaps  corrupted,  and  should  be  discarded.  However,  a  visual  inspection  of  the 
graphs  of  these  data  files  failed  to  reveal  any  evidence  that  this  was  the  case.  A 
comparison  of  all  the  exemplars  in  the  lists  of  Table  A.  15  revealed  that  23  exemplars 
were  common  to  all  four  lists. 
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It  should  be  evident  by  now  that  even  when  nets  are  trained  with  identical 
training  sets,  as  in  runs  3  through  5,  the  solutions  found  by  each  net  are  not  all 
equivalent.  Even  if  two  different  nets  classify  20  test  exemplars  incorrectly,  those 
exemplars  are  not  necessarily  the  same.  All  that  is  known  is  that  the  two  solutions  are 
equivalent  with  respect  to  classification  performance.  What  is  needed  is  a  measure 
of  the  similarity  of  the  decision  regions  formed  by  the  networks.  In  the  following 
paragraphs,  a  metric  for  this  purpose  is  proposed. 

For  the  purpose  of  illustration,  assume  that  several  nets  trained  for  a  run  have 
all  found  exactly  the  same  solution.  Clearly,  if  the  decision  region  boundaries  are 
the  same,  each  net  would  incorrectly  classify  exactly  the  same  exemplars  and  the 
error  count  for  each  one  is  equal  to  the  number  of  nets  trained  in  the  run.  There  is  a 
maximum  correlation  between  the  errors  in  classification  made  by  the  nets.  Define 
N  as  the  number  of  networks  trained  in  the  run,  and  E  as  the  number  of  different 
exemplars  on  the  error  list.  Let  Ci  be  the  the  error  count  for  the  ith  exemplar  on 
the  list.  It  should  be  clear  that,  in  this  case 

j^C,/NE=  1  (4.1) 

1=1 

Now  assume  the  other  extreme.  Assume  each  of  the  N  nets  misclassifies  only 
one  test  set  exemplar,  but  each  net  misses  a  different  exemplar.  The  error  list  now 
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contains  E  =  N  exemplars,  each  having  an  error  count  of  1.  For  this  situation 


E 

Ci/NE  —  l/N  (4.2) 

;=i 

This  suggests  a  metric,  let  it  be  called  L,  of  the  form 

E 

L  =  J2C,/NE  (4.3) 

i=i 

which  is  bounded  by  1,  for  maximum  correlation  in  the  error  list,  and  1  /N,  for  min¬ 
imum  correlation.  Note  that  if  N  approaches  infinity,  then  in  the  case  of  minimum 
correlation,  L  could  approach  zero. 

Applying  this  metric  to  the  lists  in  Table  A. 15  the  values  of  L  are  approximately 
0.362,  u.546,  0.606,  and  0.667  for  runs  2,  3,  4,  and  5,  respectively.  Since,  L  is 
a  measure  of  the  similarity  of  the  decision  regions  formed  by  the  nets  of  a  given 
run,  one  would  expect  runs  with  lower  L  values  to  yield  the  greater  classification 
improvement  under  a  majority  vote  rule  than  runs  with  higher  L  values.  Table  4.9 
shows  that  this  is  true,  with  the  exception  of  Run  5. 

Figures  4.8  through  4.11  are  graphical  representations  of  the  counts  in  Table 
A.  15.  Although  the  graphs  do  not  show  the  exemplar  labels,  there  is  a  one  to 
one  correspondence  between  the  columns  of  the  respective  graphs  and  the  entries  in 
Table  A.  15  (i.e.  column  1  of  Figure  4.8  is  the  value  of  the  count  for  the  first  entry 
in  the  list  for  Run  2). 
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Figure  4.8.  Run  2  Incorrectly  Classified  Exemplar  Counts 
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Figure  4.9.  Run  3  Incorrectly  Classified  Exemplar  Counts 
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Figure  4.10.  Run  4  Incorrectly  Classified  Exemplar  Counts 
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Figure  4.11.  Run  5  Incorrectly  Classified  Exemplar  Counts 


4-21 


The  figures  show,  in  an  intuitive  way,  the  reasons  for  the  results  presented  in 
the  previous  section.  Clearly,  if  one  arbitrarily  chooses  any  three  nets  from  a  given 
run,  the  exemplars  having  error  counts  lower  than  15  (i.e.  less  than  a  0.5  probability 
of  being  incorrectly  classified  in  that  run)  have  a  greater  than  0.5  probability  of  being 
correctly  classified  under  a  majority  vote  rule  taken  over  the  three  nets.  Observe 
in  Figures  4.8,  that  Run  2  has  38  exemplars,  approximately  70%,  which  fall  into 
this  category.  For  Runs  3,  4,  and  5,  the  proportions  of  exemplars  in  this  category 
are  approximately  44%,  40%,  and  33%,  respectively.  Based  on  these  proportions, 
it  could  be  predicted,  for  a  majority  vote  rule  over  three  nets,  that  Run  2  nets 
would  yield  the  greatest  improvement  in  classification  performance,  Run  3  the  next 
greatest,  Run  4  the  next,  and  finally  Run  5.  The  results  presented  in  Table  4.7 
support  this  prediction. 

4- 7  Summary. 

This  chapter  covered  the  major  results  of  this  thesis  effort.  First,  the  average 
training  performance  histories,  for  a  variety  of  metrics,  were  presented.  This  provided 
information  about  how  fast  the  networks  trained,  the  general  shape  and  variances  of 
the  curves,  and  the  relationships  between  the  performance  metrics.  Next,  evidence 
demonstrating  the  stationarity  and  control  of  P,  matrix  distributions  constructed 
from  network  responses  to  test  data  was  examined.  Then,  results  of  comparisons  be¬ 
tween  the  performance  PDFs  of  single  networks  and  majority  vote  networks  for  four 
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different  training  runs,  were  examined.  Finally,  a  close  inspection  of  the  incorrectly 
classified  exemplars  for  each  run  was  performed. 
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V.  Conclusions  and  Recommendations 


This  final  chapter  contains  the  closing  remarks  and  conclusions  based  on  the 
evidence  and  results  presented  in  Chapter  IV.  In  addition,  recommendations  for 
future  research  are  provided.  The  reader  should  be  aware  that  the  conclusions  pre¬ 
sented  here  only  apply  to  the  two  class  recognition  problem  for  the  data  sets  used 
in  this  study.  It  would  be  premature  to  apply  these  conclusions  to  ANNs  trained  on 
other  data  sets  or  even  a  three  class  problem  using  the  same  type  of  data  sets. 

5. 1  Conclusions. 

5.1.1  Training  Performance. 

Conclusion.  A  three-layer  back  propagation  neural  network  can 
train  directly  on  correlation  signatures  of  direct  sequence  (DS)  and  linearly  stepped 
frequency  hopped  (FH)  spread  spectrum  signals.  The  networks  can  be  expected  to 
yield  at  or  near  100%  classification  accuracy  on  training  data  sets  and  at  or  near 
80%  accuracy  on  test  data  sets  after  10,000  training  iterations.  Additionally,  the 
overall  standard  deviation  of  test  set  classification  accuracy  can  be  expected  to  be 
less  than  2%. 


Discussion.  It  is  clear  from  the  plots  shown  in  Section  4.2  that 
by  10,000  iterations,  the  nets  were  at  or  near  their  maximum  values  for  any  of 
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the  classification  performance  metrics.  Additional  training  past  10,000  iterations 
provided  little,  if  any,  improvement  in  test  set  classification  accuracy.  Additionally, 
no  reduction  in  the  variance  was  observed. 


5.1.2  Characterization  of  ANN  Performance  with  the  P  Matrix  .\[odel. 

Conclusion.  The  P  matrix  model  is  a  valid  and  useful  tool  for 
describing  and  evaluating  ANN  classification  performance. 

Discussion.  It  was  shown  in  Section  4.3  that  the  P  matrix  ob¬ 
tained  from  testing  30  networks  on  one  test  data  set  accurately  predicted  the  joint 
PDF  obtained  by  testing  the  nets  with  data  sets  having  a  different  proportions  of  ex¬ 
emplars.  There  was  essentially  no  difference  in  the  P  matrices  obtained  from  testing 
the  nets  with  either  test  data  set.  While  this  statement  is  true  for  the  conditional 
PDFs  of  class  1  exemplars,  it  may  be  debatable  for  the  PDFs  of  class  2  exemplars. 
However,  it  was  observed  that  the  nets  did  very  well  at  recognizing  class  2  exem¬ 
plars  and  the  PDFs  had  much  smaller  variances.  This  small  variance  caused  three 
of  four  tests  of  significance  to  reject  the  null  hypothesis  that  the  distributions  were 
the  same,  even  though  there  was  a  relatively  small  difference  in  their  means.  Thus, 
it  may  be  argued,  from  a  practical  viewpoint,  that  there  was  actually  no  difference 
between  these  PDFs.  The  implication  of  these  observations  is  that  the  responses 
of  the  networks  were  stationary.  The  change  observed  in  the  average  joint  P(good) 
PDF  was  due  only  to  the  change  in  the  a  priori  probabilities  P(l)  and  P(2).  Thus, 
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the  conditional  P  matrix  is  a  more  useful  description  of  a  network’s  classification 
performance  than  the  joint  probability,  P(good). 

5.1.3  Controlling  P  Matrix  Symmetry. 

Conclusion.  Given  that  training  on  a  particular  data  set  does  not 
yield  the  desired  conditional  classification  performances,  it  is  possible  to  change  this 
by  appropriate  adjustments  in  the  proportions  of  exemplar  classes  in  the  training 
data  set. 


Discussion.  The  results  in  Section  4.4  showed  the  average  condi¬ 
tional  probabilities,  P(1  |  1)  and  P( 2  |  2),  for  a  particular  run  of  30  nets  to  be 
0.7380  and  0.9213.  The  average  joint  recognition  probability  was  0.8297.  Obviously, 
These  nets  were  not  recognizing  class  1  exemplars  as  well  as  class  2  exemplars.  The 
training  data  set  proportions  were  changed  from  a  50/50%  mix  to  a  60/40%  mix  of 
class  1  to  class  2  exemplars  and  another  run  of  30  nets  were  trained.  The  resulting 
average  conditional  probabilities,  P(1  |  1)  and  P(2  |  2)  were  0.7767  and  0.8320, 
while  the  joint  probability  was  0.8043.  Statistical  tests  on  the  PDFs  confirmed  that 
the  changes  were  significant.  The  E.  matrix  was  indeed  adjusted  toward  symmetry. 
Clearly,  this  demonstrates  a  technique  for  controlling  the  symmetry  of  ANN  classi¬ 
fication  behavior.  However,  the  trade-off  was  a  reduction  in  the  joint  classification 
accuracy.  This  result  is  not  too  surprising  because  the  back  propagation  of  error 
algorithm  seeks  to  minimize  the  error  over  all  training  exemplars.  If  the  training 
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data  set  is  weighted  heavily  toward  on  exemplar  class,  then  clearly,  the  solution  in 
the  weight  space  will  shift  to  favor  that  class  since  it  now  has  a  greater  contribution 
to  the  overall  error.  Of  course,  the  exact  proportions  needed  to  achieve  symmetry 
would  be  dependent  on  the  relative  contribution  to  the  overall  error  of  each  class  of 
exemplars  taken  as  a  group.  While  the  experimental  objective  was  to  adjust  the  P 
matrix  toward  symmetry,  this  may  not  be  desirable  for  all  cases.  The  point  is  that 
a  network’s  classification  response  can  be  adjusted  to  whatever  symmetry  is  best  for 
a  given  application. 

5.1.4  Improvement  of  Classification  Performance  via  the  Majority  Vote  Rule. 

Conclusion.  If  one  trains  three  networks  with  three  different,  but 
equivalent,  training  data  sets,  it  is  possible  to  use  a  majority  vote  rule  to  realize  an 
improvement  in  average  classification  performance. 

Discussion.  In  Section  4.5,  it  was  shown  that  30  majority  vote 
networks,  constructed  from  individual  nets  trained  on  slightly  different  data  sets, 
averaged  1.8%  better  performance  than  30  individual  nets  trained  in  the  same  man¬ 
ner.  Statistical  tests  on  the  joint  recognition  PDFs  showed  this  difference  to  be 
significant.  The  differences  in  majority  vote  net  performance  and  individual  net  per¬ 
formance  for  three  other  runs  of  30  nets  trained  on  identical  training  data  sets  were 
not  significant. 
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5.1.5  Influence  of  Training  Data  Sets,  Initial  Weights,  and  Exemplar  Presen¬ 
tation  Order  on  Network  Solutions. 

Conclusion.  The  influence  of  the  training  data  set  strongly  out¬ 
weighs  the  influence  of  initial  starting  weights  and/or  the  order  of  presentation  of 
training  exemplars,  with  regard  to  the  decision  regions  formed  by  a  given  network. 

Discussion.  In  Section  4.6,  an  examination  of  the  incorrectly  clas¬ 
sified  test  set  exemplars  was  performed.  A  set  of  30  nets  were  trained  for  each  of  four 
runs.  In  Run  2,  the  exact  composition  of  the  training  set,  initial  starting  weights, 
and  presentation  order  of  exemplars,  was  different  for  each  net  trained.  In  Run  3, 
only  the  initial  weights  and  presentation  order  were  different.  In  Run  4,  only  the 
initial  weights  were  different  and  in  Run  5,  only  the  presentation  orders  were  dif¬ 
ferent.  The  values  of  L,  a  metric  measuring  the  correlation  between  the  decision 
regions  formed  by  the  nets  within  a  given  run,  for  Runs  2,  3,  4,  and  5  were  0.362, 
0.546,  0.606,  and  0.667,  respectively.  These  values  clearly  suggest  a  relative  order 
and  degree  of  impact  the  conditions  have  on  the  exact  shape  of  decision  regions 
formed.  The  L  values  indicate  that  exact  composition  of  the  training  data  set  is 
most  important,  the  initial  set  of  weights  the  next  most  important,  and  the  order  of 
exemplar  presentation  the  least  important. 
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5.2  Recommendations. 


1.  The  validity  of  the  findings  in  this  thesis  should  be  tested  against  more  complex 
recognition  problems.  This  could  be  done  by  obtaining  samples  of  the  correla¬ 
tion  signatures  of  randomly  driven  FH  and  FH/DS  signals,  and  repeating  the 
experiments  performed  in  this  study  for  a  four  class  recognition  problem. 

2.  For  future  research  involving  this  application  of  ANNs,  an  appropriate  amount 
of  white  gaussian  noise  should  be  added  to  the  simulated  spread  spectrum 
signals  before  the  correlation  signatures  are  obtained.  The  classification  per¬ 
formance  of  ANNs  in  noisy  environments  could  then  be  explored. 

3.  Investigations  should  be  made  to  determine  if  ANNs  can  learn  to  classify  cor¬ 
relation  signatures  according  to  some  other  parameter,  such  as  chip  rate  or 
code  length. 

4.  Test  data  sets  used  to  evaluate  ANN  classification  performance  should  be  com¬ 
posed  of  a  uniform  mix  of  at  least  25  exemplars  for  each  exemplar  class.  Clearly, 
the  test  data  set  is  a  measuring  device  for  determining  generalized  classification 
capabilities  of  trained  networks.  As  a  rule  of  thumb,  if  it  is  desired  to  measure 
a  performance  metric  on  a  given  trial  to  the  nearest  ±1/2  unit  of  the  metric, 
then  the  measuring  device  should  have  a  tolerance  of  ±1  unit  of  measure.  Ad¬ 
ditionally,  a  uniform  mix  of  exemplar  classes  will  provide  an  unbiased  estimate 
of  classification  accuracy.  It  was  shown  in  this  thesis,  that  if  the  E_  matrix  for 
a  trained  network  is  not  symmetric,  it  is  possible  to  shift  the  value  of  the  joint 
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classification  performance  significantly  simply  by  biasing  the  test  set  in  favor  of 
one  exemplar  class  or  the  other.  Future  research  involving  evaluation  of  ANN 
performance  should  report  conditional  classification  accuracies,  as  well  as  the 
overall  joint  accuracy. 
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Appendix  A.  Data  Tables 


The  following  data  tables  were  contructed  from  results  of  network  training 
runs.  Networks  were  trained  to  20,000  iterations  and  performance  tested  against 
a  test  data  set.  Exemplars  in  the  test  data  sets  were  not  included  in  the  training 
data  sets.  Each  observed  majority  vote  network  was  constructed  by  drawing  three 
independently  trained  single  nets  from  a  pool  of  90  without  replacement.  These 
nets  would  vote  on  the  final  classification  of  each  test  exemplar.  Each  calculated 
majority  vote  network  was  contructed  by  using  the  P_  matrices  constructed  for  the 
three  selected  nets.  This  calculated  matrix  assumes  statistical  independence  between 
the  outcome  of  individual  nets,  which  is  to  say  that  there  is  no  relationship  between 
the  classification  outcome  of  any  two  nets  for  a  particular  test  exemplar.  Although 
only  the  P(  1  |  1),  P( 2  |  2),  and  P(good)  distributions  are  used,  the  P( 2  |  1)  and 
P(  1  |  2)  probabilities  are  shown  for  completeness. 
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Table  A.l.  Observed  Probability  Matrices  for  Run  1 


Net 

P(l|l) 

P(  2  1  1) 

P(1  |2) 

— 

P(  2  1  2) 

P(good) 

Observed  Calculated 

B  " 

0.74 

0.26 

0.08 

0.92 

0.83 

0.848 

0.74 

0.26 

0.08 

0.92 

0.83 

0.848 

0.78 

0.22 

0.08 

0.92 

0.85 

0.864 

net4 

0.74 

0.26 

0.08 

0.92 

0.83 

0.848 

net5 

0.72 

0.28 

0.08 

0.92 

0.82 

0.840 

net  6 

0.76 

0.24 

0.04 

0.96 

0.86 

0.880 

net  7 

0.72 

0.28 

0.06 

0.94 

0.83 

0.852 

net8 

0.74 

0.26 

0.08 

0.92 

0.83 

0.848 

net9 

0.80 

0.20 

0.10 

0.90 

0.85 

0.860 

net  10 

0.66 

0.34 

0.08 

0.92 

0.79 

0.816 

net  11 

0.72 

0.28 

0.08 

0.92 

0.82 

0.840 

net  12 

0.74 

0.26 

0.06 

0.94 

0.84 

0.860 

net  13 

0.76 

0.24 

0.04 

0.96 

0.86 

0.880 

net  14 

0.76 

0.24 

0.10 

0.90 

0.83 

0.844 

net  15 

0.80 

0.20 

0.12 

0.88 

0.84 

0.848 

net  16 

0.76 

0.24 

0.08 

0.92 

0.84 

0.856 

net  17 

0.76 

0.24 

0.12 

0.88 

0.82 

0.832 

net  18 

0.68 

0.32 

0.08 

0.92 

0.80 

0.824 

netl9 

0.78 

0.22 

0.10 

0.90 

0.84 

0.852 

net20 

0.72 

0.28 

0.08 

0.92 

0.82 

0.840 

net21 

0.78 

0.22 

0.08 

0.92 

0.85 

0.864 

net22 

0.76 

0.24 

0.08 

0.92 

0.84 

0.856 

net23 

0.74 

0.26 

0.08 

0.92 

0.83 

0.848 

net  24 

0.74 

0.26 

0.06 

0.94 

0.84 

0.860 

net25 

0.64 

0.36 

0.06 

0.94 

0.79 

0.820 

net26 

0.74 

0.26 

0.06 

0.94 

0.84 

0.860 

net  27 

0.66 

0.34 

0.10 

0.90 

0.78 

0.804 

net28 

0.76 

0.24 

0.08 

0.92 

0.84 

0.856 

net29 

C.74 

0.26 

0.08 

0.92 

0.83 

0.848 

net30 

0.70 

0.30 

0.06 

0.94 

0.82 

0.844 

Mean 

STD 

m 

0.2620 

0.0384 

— 

0.9213 

0.0185 

ma 

MU 
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Table  A. 2.  Observed  Probability  Matrices  for  Run  la 


Table  A. 3.  Observed  Probability  Matrices  for  Run  2  Single  Nets 


Net 

P(1  |1) 

P( 2  |  1) 

P(1  |2) 

P(  2  1  2) 

P(good) 

netl 

0.82 

0.18 

0.30 

0.70 

0.76 

net2 

0.86 

0.14 

0.20 

0.80 

0.83 

net3 

0.84 

0.16 

0.22 

0.78 

0.81 

net4 

0.78 

0.22 

0.16 

0.84 

0.81 

aet5 

0.74 

0.26 

0.12 

0.88 

0.81 

net6 

0.74 

0.26 

0.24 

0.76 

0.75 

net7 

0.74 

0.26 

0.10 

0.90 

0.82 

net8 

0.78 

0.22 

0.20 

0.80 

0.79 

net9 

0.76 

0.24 

0.12 

0.88 

0.82 

net  10 

0.82 

0.18 

0.18 

0.82 

0.82 

net  1 1 

0.76 

0.24 

0.10 

0.90 

0.83 

net  12 

0.76 

0.24 

0.12 

0.88 

0.82 

net  13 

0.72 

0.28 

0.22 

0.78 

0.75 

net  14 

0.74 

0  26 

0.22 

0.78 

0.76 

net  15 

0.76 

0.24 

0.16 

0.84 

0.80 

net  16 

0.82 

0.18 

0.14 

0.86 

0.84 

net  17 

U.74 

0.26 

0.18 

0.82 

0.78 

net  18 

0.80 

0.20 

0.22 

0.78 

0.79 

net  19 

0.70 

0.30 

0.18 

0.82 

0.76 

net20 

0.80 

0.20 

0.12 

0.88 

0.84 

net21 

0.78 

0.22 

0.16 

0.84 

0.81 

net?2 

0.82 

0.18 

0.10 

0.90 

0.86 

net23 

0.72 

0.28 

0.12 

0.88 

0.80 

ne‘  24 

U.72 

0.28 

0.12 

0.88 

0.80 

net‘25 

0.74 

0.26 

0.32 

0.68 

0.71 

net26 

0.72 

0.28 

0.18 

0.82 

0.77 

net27 

0.80 

0.20 

0.10 

0.90 

C.85 

net28 

0.90 

3.10 

0.14 

0.86 

0.88 

net29 

0.76 

0.24 

0.16 

0.84 

0.80 

nt  t30 

0.86 

0.14 

0.14 

0.86 

0.86 

Mean 

■jggn 

wmm 

STD 

Bnfl 

HI 

Table  A. 4.  Observed  Probability  Matrices  for  Run  2  Majority  Vote  Nets 


Net 

mvnetl 
mvnet2 
mvnet3 
mvnet4 
mvnet5 
mvnet6 
mvnet? 
mvnet8 
mvnet9 
mvnet  10 
mvnetl  1 
mvnet  12 
mvnet  13 
mvnet  14 
mvnet  15 
mvnet  16 
mvnet  17 
mvnet  18 
mvnetl9 
mvnet20 
mvnet21 
mvnet22 
mvnet23 
mvnet24 
mvnet25 
mvnet26 
mvnet27 
mvnet28 
mvnet29 
mvnet30 
Mean 
STD 


P(  1 1 1) 

P( 2  i  i) 

P(l\2) 

P( 2  |  2) 

P(good) 

0.16 

0.20 

0.80 

0.82 

mSm 

0.24 

0.10 

0.90 

0.83 

■SB 

0.20 

0.14 

0.86 

0.83 

0.82 

0.18 

0.16 

0.84 

0.83 

0.78 

0.22 

0.14 

0.86 

0.82 

0.76 

0.24 

0.08 

0.92 

0.84 

0.80 

0.20 

0.16 

0.84 

0.82 

0.76 

0.24 

0.12 

0.88 

0.82 

0.78 

0.22 

0.16 

0.84 

0.81 

0.82 

0.18 

0.24 

0.76 

0.79 

0.84 

0.16 

0.20 

0.80 

0.82 

0.74 

0.26 

0.14 

0.86 

0.80 

0.78 

0.22 

0.12 

0.88 

0.83 

0.84 

0.16 

0.16 

0.84 

0.84 

0.78 

0.22 

0.1? 

0.88 

0.83 

0.80 

0.20 

0.12 

0.88 

0  44 

0.76 

0.24 

0.14 

0.86 

0.81 

0.78 

0.22 

0.12 

0.88 

0.83 

0.78 

0.22 

0.18 

0.82 

0.80 

0.78 

0.22 

0.20 

0.80 

0.79 

0.90 

0.10 

0.20 

0.80 

0.85 

0.80 

0.20 

0.08 

0.92 

0.86 

0.76 

0.24 

0.10 

0.90 

0.83 

0.74 

0.26 

0.18 

0.82 

0.78 

0.78 

0.22 

0.12 

0.88 

0.83 

0.80 

0.20 

0.12 

0.88 

0.84 

0.78 

0.22 

0.16 

0.84 

0.81 

0.76 

0.24 

0.10 

0.90 

0.83 

0.82 

0.18 

0.14 

0.86 

0.84 

0.78 

0.22 

0.18 

0.82 

0.80 

Table  A. 5.  Calculated  Probability  Matrices  for  Run  2  Majority  Vote  Nets 


Net 

P(l|l) 

P( 2  t  1) 

P(  1  I  2) 

P( 2  |  2) 

P(  good  ) 

cmvnetl 

0.92 

0.09 

0.91 

■■ 

cmvnet2 

0.91 

0.09 

0.10 

0.90 

SB 

cmvnet3 

0.86 

0.14 

0.07 

0.93 

msm 

cmvnet4 

0.90 

0.10 

0.06 

0.94 

0.92 

cmvnet5 

0.88 

0.12 

0.11 

0.89 

0.88 

cmvnet6 

0.86 

0.14 

0.05 

0.95 

0.91 

cmvnet7 

0.86 

0.14 

0.09 

0.91 

0.89 

cmvnet8 

0.81 

0.19 

0.03 

0.97 

0.89 

cmvnet9 

0.87 

0.13 

0.10 

0.90 

0.88 

cmvnetlO 

0.90 

0.10 

0.13 

0.87 

0.88 

cmvnetll 

0.93 

0.07 

0.09 

0.91 

0.92 

cmvnetl  2 

0.86 

0.14 

0.10 

0.90 

0.88 

cmvnetl  3 

0.86 

0.14 

0.06 

0.94 

0.90 

cmvnetl  4 

0.90 

0.10 

0.07 

0.93 

0.91 

cmvnetl5 

0.83 

0.17 

0.10 

0.90 

0.86 

cmvnetl6 

0.88 

0.12 

0.07 

0.93 

U.90 

cmvnetl  7 

0.88 

0.12 

0.07 

0.93 

0.91 

cmvnetl8 

0.86 

0.14 

0.04 

0.96 

0.91 

cmvnetl9 

0.85 

0.15 

0.13 

0.87 

0.86 

cmvnet20 

0.89 

0.11 

0.14 

0.86 

0.87 

cmvnet21 

0.94 

0.06 

0.09 

0.91 

0.92 

cmvnet22 

0.90 

0.10 

0.07 

0.93 

0.92 

cmvnet23 

0.86 

0.14 

0.05 

0.95 

0.90 

cmvnet24 

0.82 

0.18 

0.08 

0.92 

0.87 

cmvnet25 

0.88 

0.12 

0.10 

0.90 

0.89 

cmvnet26 

0.87 

0.13 

0.05 

0.95 

0.91 

cmvnet27 

0.88 

0.12 

0.07 

0.93 

0.90 

cmvnet28 

0.81 

0.19 

0.03 

0.97 

0.89 

cmvnet29 

0.88 

0.12 

0.11 

0.89 

0.89 

cmvnet30 

0.88 

0.12 

0.11 

0.89 

0.88 

Mean 

0.8737 

0.0822 

■jEEEjl 

WjEffi 

STD 

0.0313 

0.0296 

EH9 

■SIB 
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Table  A. 6.  Observed  Probability  Matrices  for  Run  3  Single  Nets 


Net 

P(l|l) 

P(  2  1  1) 

P(1  |2) 

P(  2  1  2) 

P(good) 

netl 

0.68 

0.32 

0.08 

0.92 

mm | 

net2 

0.70 

0.30 

0.06 

0.94 

n 

net3 

0.78 

0.22 

0.10 

0.90 

mm 

net4 

0.70 

0.30 

0.04 

0.96 

0.83 

net5 

0.72 

0.28 

0.10 

0.90 

0.81 

net6 

0.66 

0.34 

0.06 

0.94 

0.80 

net7 

0.72 

0.28 

0.08 

0.92 

0.82 

net8 

0.72 

0.28 

0.08 

0.92 

0.82 

net9 

0.70 

0.30 

0.08 

0.92 

0.81 

net  10 

0.72 

0.28 

0.08 

0.92 

0.82 

net  11 

0.68 

0.32 

0.04 

0.96 

0.82 

netl2 

0.74 

0.26 

0.08 

0.92 

0.83 

net  13 

0.68 

0.32 

0.06 

0.94 

0.81 

net  14 

0.78 

0.22 

0.10 

0.90 

0.84 

net  15 

0.70 

0.30 

0.08 

0.92 

0.81 

net  16 

0.70 

0.30 

0.10 

0.90 

0.80 

net  17 

0.66 

0.34 

0.08 

0.92 

0.79 

net  18 

0.76 

0.24 

0.08 

0.92 

0.84 

net  19 

0.68 

0.32 

0.06 

0.94 

0.81 

net20 

0.70 

0.30 

0.12 

0.88 

0.79 

net21 

0.72 

0.28 

0.10 

0.90 

0.81 

net22 

0.72 

0.28 

0.10 

0.90 

0.81 

net23 

0.70 

0.30 

0.10 

0.90 

0.80 

net  24 

0.70 

0.30 

0.06 

0.94 

0.82 

net25 

0.66 

0.34 

0.10 

0.90 

0.78 

net26 

0.70 

0.30 

0.10 

0.90 

0.80 

net27 

0.74 

0.26 

0.10 

0.90 

0.82 

net28 

0.76 

0.24 

0.06 

0.94 

0.85 

uet29 

0.70 

0.30 

0.10 

0.90 

0.80 

net30 

0.76 

0.24 

0.12 

0.88 

0.82 

Mean 

0.2887 

KSSSM 

STD 

0.0325 

Table  A. 7.  Observed  Probability  Matrices  for  Run  3  Majority  Vote  Nets 


mvnetl 
mvnet2 
mvnet3 
mvnet4 
mvnet5 
mvnet6 
mvnet7 
mvnet8 
mvnet9 
mvnetlO 
mvnet  1 1 
mvnetl  2 
mvnet  13 
mvnet  14 
mvnet  15 
mvnet  16 
mvnet  17 
mvnet  18 
mvnet  19 
mvnet20 
mvnet21 
mvnet22 
mvnet23 
mvnet24 
mvnet25 
mvnet26 
mvnet27 
mvnet28 
mvnet29 
mvnet30 


Mean 

STD 


P(  111) 

P(  2  |  1) 

P(  1  1  2) 

P( 2  I  2) 

P(good) 

0.74 

0.26 

0.10 

0.90 

0.82 

0.74 

0.26 

0.10 

0.90 

0.82 

0.70 

0.30 

0.10 

0.90 

0.80 

0.72 

0.28 

0.06 

0.94 

0.83 

0.74 

0.26 

0.08 

0.92 

0.83 

0.68 

0.32 

0.10 

0.90 

0.79 

0.72 

0.28 

0.08 

0.92 

0.82 

0.68 

0.32 

0.08 

0.92 

0.80 

0.72 

0.28 

0.10 

0.90 

0.81 

0.68 

0.32 

0.06 

0.94 

0.81 

0.78 

0.22 

0.04 

0.96 

0.87 

0.72 

0.28 

0.08 

0.92 

0.82 

0.70 

0.30 

0.08 

0.92 

0.81 

0.74 

0.26 

0.10 

0.90 

0.82 

0.72 

0.28 

0.08 

0.92 

0.82 

0.68 

0.32 

0.10 

0.90 

0.79 

0.78 

0.22 

0.08 

0.92 

0.85 

0.70 

0.30 

0.10 

0.90 

0.80 

0.74 

0.26 

0.08 

0.92 

0.83 

0.70 

0.30 

0.06 

0.94 

0.82 

0.76 

0.24 

0.08 

0.92 

0.84 

0.70 

0.30 

0.08 

0.92 

0.81 

0.74 

0.26 

0.08 

0.92 

0.83 

0.76 

0.24 

0.06 

0.94 

0.85 

0.74 

0.26 

0.10 

0.90 

0.82 

0.70 

0.30 

0.10 

0.90 

0.80 

0.72 

0.28 

0.08 

0.92 

0.82 

0.74 

0.26 

0.08 

0.92 

0.83 

0.70 

0.30 

0.08 

0.92 

0.81 

0.74 

0.26 

0.08 

0.92 

0.83 

Table  A. 8.  Calculated  Probability  Matrices  for  Run  3  Majority  Vote  Nets 


Net 

p( 1 1 1) 

P(  2  1  1) 

P(  1  1  2) 

P(  2  1  2) 

P(  good  ) 

cmvnetl 

0.83 

0.17 

■a i 

MEM 

cmvnet2 

0.82 

0.18 

mm 

— 

cmvnet3 

0.79 

0.21 

0.02 

0.98 

SB 

cmvnet4 

0.82 

0.18 

0.01 

0.99 

SSfS 

cmvnet5 

0.80 

0.20 

0.02 

0.98 

0.89 

cmvnet6 

0.80 

0.20 

0.01 

0.99 

0.89 

cmvnet7 

0.83 

0.17 

0.01 

0.99 

0.91 

cmvnet8 

0.78 

0.22 

0.02 

0.98 

0.88 

cmvnet9 

0.80 

0.20 

0.02 

0.98 

0.89 

cmvnetlO 

0.82 

0.18 

0.01 

0.99 

0.90 

cmvnetll 

0.83 

0.17 

0.01 

0.99 

0.91 

cmvnetl  2 

0.82 

0.18 

0.02 

0.98 

0.90 

cmvnetl3 

0.82 

0.18 

0.02 

0.98 

0.90 

cmvnetl4 

0.80 

0.20 

0.03 

0.97 

0.89 

cmvnetl  5 

0.81 

0.19 

0.01 

0.99 

0.90 

cmvnetl  6 

0.83 

0.17 

0.02 

0.98 

0.90 

cmvnetl  7 

0.87 

0.13 

0.02 

0.98 

0.92 

cmvnetl  8 

0.78 

0.22 

0.02 

0.98 

0.88 

cmvnetl  9 

0.83 

0.17 

0.01 

0.99 

0.91 

cmvnet20 

0.84 

0.16 

0.01 

0.99 

0.92 

cmvnet21 

0.85 

0.15 

0.02 

0.98 

0.91  j 

cmvnet22 

0.81 

0.19 

0.02 

0.98 

0.90 

cmvnet23 

0.82 

0.18 

0.02 

0.98 

0.90 

cmvnet24 

0.84 

0.16 

0.01 

0.99 

0.91 

cmvnet25 

0.82 

0.18 

0.02 

0.98 

0.90 

cmvnet26 

0.80 

0.20 

0.02 

0.98 

0.89 

cmvnet27 

0.82 

0.18 

0.02 

0.98 

0.90 

cmvnet28 

0.82 

0.18 

0.02 

0.98 

0.90 

cmvnet29 

0.83 

0.17 

0.02 

0.98 

0.91 

cmvnet30 

0.82 

0.18 

0.02 

0.98 

0.90 

Mean 

mm a 

Hsa 

0.0190 

0.9810 

WBM 

STD 

mm 

0.0052 

— 

Table  A. 9.  Observed  Probability  Matrices  for  Run  4  Single  Nets 
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Table  A.  10.  Observed  Probability  Matrices  for  Run  4  Majority  Vote  Nets 


A-l  1 


Table  A. 11.  Calculated  Probability  Matrices  for  Run  4  Majority  Vote  Nets 


Net 

P(l  |  1) 

P(  2  1  1) 

P(1  |  2) 

P{  2  1  2) 

P(  good  ) 

cmvnet31 

0.83 

0.17 

0.02 

0.98 

MEM 

cmvnet32 

0.84 

0.16 

0.02 

0.98 

— 

cmvnet33 

0.78 

0.22 

0.02 

0.98 

— 

cmvnet34 

0.78 

0.22 

0.02 

0.98 

cmvnet35 

0.78 

0.22 

0.01 

0.99 

0.88 

cmvnet36 

0.81 

0.19 

0.02 

0.98 

0.89 

cmvnet37 

0.83 

0.17 

0.02 

0.98 

0.91 

cmvnet38 

0.78 

0.22 

0.01 

0.99 

0.89 

cmvnet39 

0.82 

0.18 

0.03 

0.97 

0.89 

cmvnet40 

0.77 

0.23 

0.02 

0.98 

0.88 

cmvnet41 

0.83 

0.17 

0.01 

0.99 

0.91 

cmvnet42 

0.80 

0.20 

0.01 

0.99 

0.90 

cmvnet43 

0.78 

0.22 

0.02 

0.98 

0.88 

cmvnet44 

0.79 

0.21 

0.02 

0.98 

0.89 

cmvnet45 

0.82 

0.18 

0.02 

0.98 

0.90 

cmvnet46 

0.84 

0.16 

0.02 

0.98 

0.91 

cmvnet47 

0.82 

0.18 

0.02 

0.98 

0.90 

cmvnet48 

0.79 

0.21 

0.02 

0.98 

0.89 

cmvnet49 

0.78 

0.22 

0.02 

0.98 

0.88 

cmvnet50 

0.79 

0.21 

0.02 

0.98 

0.89 

cmvnet51 

0.80 

0.20 

0.02 

0.98 

0.89 

cmvnet52 

0.76 

0.24 

0.01 

0.99 

0.87 

cmvnet53 

0.81 

0  19 

0.02 

0.98 

0.90 

cmvnet54 

0.77 

0.23 

0.01 

0.99 

0.88 

cmvnet55 

0.83 

0.17 

0.02 

0.98 

0.91 

cmvnet56 

0.82 

0.18 

0.02 

0.98 

0.90 

cmvnet57 

0.78 

0.22 

0.02 

0.98 

0.88 

cmvnet58 

0.78 

0.22 

‘•.02 

0.98 

0.88 

cmvnet59 

0.78 

0.22 

0.98 

0.88 

cmvnet60 

0.78 

0.22 

v.O'i 

0.98 

0.88 

Mean 

0.2008 

0.9829 

STD 

0.0228 

■9 

wEBM 
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Table  A.  12.  Observed  Probability  Matrices  for  Run  5  Single  Nets 


Table  A.  13.  Observed  Probability  Matrices  for  Run  5  Majority  Vote  Nets 


mvnet61 

mvnet62 

mvnet63 

mvnet64 

mvnet65 

mvnet66 

mvnet67 

mvnet68 

mvnet69 

mvnet70 

mvnet71 

mvnet72 

mvnet73 

mvnet74 

mvnet75 

mvnet76 

mvnet77 

mvnet78 

mvnet79 

mvnet80 

mvnet81 

mvnet82 

mvnet83 

mvnet84 

mvnet85 

mvnet86 

mvnet87 

mvnet88 

mvnet89 

mvnet90 


Mean 

STD 


P(1  |  1) 


.70 
.70 
0.70 
0.70 
0.74 
0.68 
0.70 
0.66 
0.70 
0.72 
0.66 
0.74 
0.70 
0.72 
0.70 
0.76 
0.66 
0.70 
0.68 
0.70 
0.70 
0.70 
0.70 
0.68 
0.68 
0.70 
0.70 
0.66 
0.70 
0.74 


.6993 

.0239 


■(2  |  1) 

P( 

0.30 

0 

0.30 

c 

0.30 

0 

0.30 

0 

0.26 

0 

0.32 

0 

0.30 

0 

0.34 

0 

0.30 

0 

0.28 

0 

0.34 

0 

0.26 

0 

0.30 

0 

0.28 

0 

0.30 

0 

0.24 

0 

0.34 

c 

0.30 

0 

0.32 

0 

0.30 

0 

0.30 

0 

0.30 

0 

0.30 

0 

0.32 

0 

0.32 

0 

0.30 

0 

0.30 

0 

0.34 

0 

0.30 

0 

0.26 

0 

L3007 

.0239 

o.< 

0.( 

P(2|2)  P(good) 


.92 
'.92 
0.92 
0.92 
0.90 
0.92 
0.90 
0.94 
0.92 
0.92 
0.92 
0.90 
0.92 
0.92 
0.92 
0.92 
0.92 
0.92 
0.94 
0.92 
0.92 
0.90 
0.90 
0.92 
0.92 
0.92 
0.92 
0.92 
0.92 
0.90 


.9173 

.0100 


Table  A.  14.  Calculated  Probability  Matrices  for  Run  5  Majority  Vote  Nets 


Net 

P(l|l) 

P{  2  1  1) 

P(  1  i  2) 

P(  2  1  2) 

P{  good  ) 

cmvnet61 

0.81 

0.19 

0.02 

0.98 

0.89 

cmvnet62 

0.78 

0.22 

0.02 

0.98 

0.88 

cmvnet63 

0.80 

0.20 

0.02 

0.98 

0.89 

cmvnet64 

0.82 

0.18 

0.02 

0.98 

0.90 

cmvnet65 

0.83 

0.17 

0.02 

0.98 

0.90 

cmvnet66 

0.77 

0.23 

0.02 

0.98 

0.87 

cmvnet67 

0.79 

0.21 

0.02 

0.98 

0.88 

cmvnet68 

0.78 

0.22 

0.01 

0.99 

0.88 

cmvnet69 

0.78 

0.22 

0.02 

0.98 

0.88 

cmvnet70 

0.81 

0.19 

0.02 

0.98 

0.90 

cmvnet71 

0.77 

0.23 

0.02 

0.98 

0.87 

cmvnet72 

0.82 

0.18 

0.02 

0.98 

0.90 

cmvnet73 

0.77 

0.23 

0.02 

0.98 

0.87 

cmvnet74 

0.78 

0.22 

0.02 

0.98 

0.88 

cmvnet75 

0.80 

0.20 

0.02 

0.98 

0.89 

cmvnet76 

0.83 

0.17 

0.02 

0.98 

0.91 

cmvnet77 

0.77 

0.23 

0.02 

0.98 

0.87 

cmvnet78 

0.80 

0.20 

0.02 

0.98 

0.89 

cmvnet79 

0.78 

0.22 

0.02 

0.98 

0.88 

cmvnet80 

0.79 

0.21 

0.02 

0.98 

0.89 

cmvnet81 

0.78 

0.22 

0.02 

0.98 

0.88 

cmvnet82 

0.80 

0.20 

0.02 

0.98 

0.89 

cmvnet83 

0.80 

0.20 

0.02 

0.98 

0.89 

cmvnet84 

0.76 

0.24 

0.02 

0.98 

0.87 

cmvnet85 

0.78 

0.22 

0.02 

0.98 

0.88 

cmvnet86 

0.81 

0.19 

0.02 

0.98 

0.90 

cmvnet87 

0.78 

0.22 

0.02 

0.98 

0.88 

cmvnet88 

0.78 

0.22 

0.01 

0.99 

0.89 

cmvnet89 

0.78 

0.22 

0.02 

0.98 

0.88 

cmvnet90 

0.82 

0.18 

0.02 

0.98 

0.90 

Mean 

0.7921 

0.2079 

0.0195 

0.8863 

STD 

0.0196 

0.0196 

0.0099 
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Table  A.  15.  Incorrectly  Classified  Test  Exemplars  for  Run  2  Through  Run  5 


Run 


File 


corr96 

coit25 

corrl94 

corr53 

corr227 

conl64 

coit7 

corrl82 

coit37 

corrl45 

coit170 

corrl68 

con-17 

con-92 

con-241 

con-82 

conl9 

con31 

con57 

con90 

con41 

con80 

con86 

con88 

con64 

con-66 

con51 

con35 

con84 

con62 

con70 

con254 

con-29 

con27 

coni  76 

con243 

corr68 

con72 

con  190 

con20S 

con233 

conl49 

con200 

con202 

con231 

con235 

con43 

conl47 

con39 

conl51 

con2ll 

con2 1 3 

con223 

con225 


2S 


Count 


30 

29 

28 

27 

27 

26 

24 

24 

23 

23 

22 

20 

17 

16 

16 

15 

13 

12 

12 

11 

10 

10 

10 

10 

9 

9 

9 

8 

8 


Run 


File 


coni  i 

con25 

con37 

con96 

con227 

conl94 

con53 

con7 

con51 

conl64 

conl82 

con  168 

conl69 

conl45 

con41 

con241 

con29 

conl9 

con57 

con82 

con35 

con39 

coni  76 

con205 

con200 

con31 

coni  28 

con27 

corrl90 

coni  3 

con43 

con59 

coni  78 

con201 


3S 


Count 


30 

30 

30 

30 

30 

30 

29 

28 

28 

28 

28 

26 

26 

25 

24 

21 

20 

19 

17 

12 

9 

7 

7 

4 

4 

3 

3 

2 

2 

1 

1 

1 

1 

1 


Run 


File 


coni  7 

con25 

con37 

con41 

con96 

coni  94 

con  168 

con53 

conl64 

coni  82 

con227 

con7 

coni  9 

coni  70 

con51 

coni  52 

coni  76 

con241 

con29 

con82 

con200 

con57 

con31 

con35 

con8C 

con39 

con43 

con  190 

con  27 

coni  84 


4S 


Count 


30 

30 

30 

30 

30 

30 

29 

28 

28 

28 

28 

27 

26 

25 

23 

22 

15 

15 

14 

10 

10 

9 

8 

5 

3 

3 

3 

3 

2 

1 


Run 


File 


con25 

con37 

con96 

con51 

coni  64 

con  182 

coni  94 

con29 

con53 

coni  70 

con241 

coni  7 

conl9 

coni  45 

con227 

con  7 

con57 

con41 

coni  68 

coni  76 

con82 

con200 

con31 

con39 

coni  51 

con35 

con45 

con219 

conl90 

con202 


5S 


Count 


30 

30 

30 

30 

30 

30 

30 

29 

29 

29 

29 

27 

27 

27 

26 

22 

21 

19 

19 

19 

13 
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Appendix  B.  Data  File  Samples  and  Processing  Software 

B.  I  Preprocessing  of  Correlation  Product  Data  Files. 

The  actual  preprocessing  of  the  correlation  product  data  was  done  using  a 
commercial  digital  signal  processing  package  called  DADiSP  Worksheet (m,  by  DSP 
Developement  Corporation,  One  Kendall  Square,  Cambridge,  MA  02139.  The  soft¬ 
ware  package  is  a  graphics-based  spreadsheet  with  a  multi- window  environment.  The 
package  has  its  own  Command  File  language  for  automating  processing  tasks.  For 
completeness,  the  command  line  of  the  windows  of  the  worksheet  are  shown  here, 
followed  by  a  sample  of  a  command  file  used  to  process  the  correlation  data.  Figures 
B.l  and  B.2  show  the  before  and  after  plots  of  a  typical  direct  sequence  correlation 
signature,  while  figures  B.3  and  B.4  show  before  and  after  plots  of  a  typical  frequency 
hopped  signature. 

DADiSP  Worksheet$\, ‘{tm}$  algorithm  implemented  in  a 
worksheet  called  REDUCE1 . 

WINDOW  1  :  <file  read  in  here> 

WINDOW  2  :  Decimate(Wl ,2, 1) 

WINDOW  3  :  Decimate(Wl ,2,2) 

WINDOW  4  :  Avg(W2,W3) 

WINDOW  5  :  Abs(W4)  I  fmax 

WINDOW  6  :  W4/getpt(W5,curpos(W5))  |  fmax  I  nmove(-25) 

WINDOW  7  :  Extract (W6,curpos(W6) ,50) 


Sample  of  DADiSP  Worksheets Command  File. 

D  D:\corrdat  tier  0  thesis  Ccr  W  L  reducel  Ccr  £ 
Ccntl.home 

Of  8  C0RR6.1  Ccr  Ccr  writea("f corr6 .dat" ,w7)  Ccr 
Qf8  C0RR7.1  Ccr  Ccr  writea("f corr7 .dat" ,w7)  Ccr 
Qf8  C0RR9.1  Ccr  Ccr  writea("fcorr9.dat",w7)  Ccr 
Cf8  C0RR10.1  Ccr  Ccr  writea("fcorrl0.dat",w7)  Ccr 
Cf8  C0RR12.1  Ccr  Ccr  vritea("fcorrl2.dat",w7)  Ccr 
Cf8  C0RR13.1  Ccr  Ccr  vritea("fcorrl3.dat",w7)  Ccr 
Cf8  C0RR14.1  Ccr  Ccr  vritea("fcorrl4.dat" ,w7)  Ccr 
Cf8  C0RR15.1  Ccr  Ccr  writea("fcorri5.dat",w7)  Ccr 
0f8  C0RR16.1  Ccr  Ccr  writea("fcorrl6.dat" ,w7)  Ccr 
Cf8  C0RR17.1  Ccr  Ccr  writea("fcorrl7.dat",w7)  Ccr 

(Same  pattern  repeated  for  each  corrXX  file) 

Of 8  CQRR52.1  Ccr  Ccr  wr it ea( "fcorr52.dat", w7)  Ccr 
Cf8  C0RR53.1  Ccr  Ccr  writea("fcorr53.dat",w7)  Ccr 
Cf8  C0RR54.1  Ccr  Ccr  vritea("fcorr54.dat" ,w7)  Ccr 
Cf8  C0RR55.8  Ccr  Ccr  vritea("f corr55.dat" ,w7)  Ccr 
Of 8  C0RR56.1  Ccr  Ccr  writea("fcorr56.dat”,w7)  Ccr 
Cf8  C0RR57.1  Ccr  Ccr  writea("fcorr57.dat",w7)  Ccr 
Cf8  C0RR58.1  Ccr  Ccr  writea("f corr58.dat" ,w7)  Ccr 
Cf8  C0RR59.1  Ccr  Ccr  writea("f corr59.dat" ,»7)  Ccr 
Cf8  C0RR60.1  Ccr  Ccr  writea("fcorr60.dat",w7)  Ccr 
Cesc  Case  Cesc  y  Cesc  Cesc  Cesc 
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0  50  100  150  200  250  300  350  400  450  500 

Time  Units 

Figure  B.l.  Direct  Sequence  Correlation  Product  CORR18  Before  Processing 


0  5  10  15  20  25  30  35  40  45  50 

Time  Units 

Figure  B.2.  Direct  Sequence  Correlation  Product  CORR18  After  Processing 
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Figure  B.3.  Frequency- Hopped  Correlation  Product  CORR148  After  Processing 
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Figure  B.4.  Frequency-Hopped  Correlation  Product  CORR148  After  Processing 
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B.2  Construction  of  Datasets. 


In  this  section,  the  details  of  how  the  datasets  used  to  train  the  networks,  were 
constructed.  The  software  routine  is  written  in  QuickBasic*m.  The  user  is  prompted 
for  the  number  of  exemplars  to  be  used  for  training,  the  number  of  exemplars  for 
testing,  the  number  of  classes  of  exemplars,  and  the  number  of  elements  in  each 
exemplar.  The  routine  expects  the  user  to  provide  the  name  of  an  input  file  contain¬ 
ing  the  a  sequence  number,  filename,  and  class  for  each  correlation  signature  to  be 
included  in  the  dataset.  This  provides  for  absolute  control  over  the  exact  structure 
and  mix  of  exemplars  in  the  training  and  test  datasets.  The  NeuralGraphics  simu¬ 
lator  reserves  the  specified  number  of  exemplars  for  testing  from  the  bottom  of  the 
file.  In  other  words,  if  100  test  exemplars  are  specified  for  a  file  containing  a  total 
of  250  exemplars,  the  last  100  exemplars  will  be  used  as  the  test  dataset.  Presented 
here  is  a  sample  input  file  and  the  source  code  for  the  routine  CONSTRUC.BAS. 


Sample  input  file  for  constucting  a  dataset. 

1,  "d:\data\corrdatl\fcorr6.dat",  1 

2,  "d:\data\corrdat4\fcorr61 .datM,  2 

3,  "d:\data\corrdatl\fcorr9.dat",  1 

4,  "d:\data\corrdat4\fcorr63.dat",  2 

5,  "d:\data\corrdatl\fcorrl2.dat",  1 

6,  "d:\data\corrdat4\fcorr65.dat",  2 

7,  "d:\data\corrdatl\fcorrl4.dat",  1 

8,  "d:\data\corrdat4\fcorr67.dat",  2 

9,  "d:\data\corrdatl\fcorrl6.dat",  1 

10,  "d:\data\corrdat4\fcorr69.dat",  2 

(Same  pattern  repeated  for  each  file  included  in  dataset) 
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192, 

193, 

194, 

195, 

196, 

197, 

198, 

199, 

200, 
201, 
202, 


"d:\data\corrdat4\fcorr233.dat" ,  2 
"d:\data\corrdatl\fcorrl94.dat" ,  1 
"d : \data\corrdat4\f corr235 .dat" ,  2 
"d:\data\corrdatl\fcorrl96.dat" ,  1 
"d : \data\corrdat4\f corr237 . dat” ,  2 
"d:\data\corrdatl\fcorrl98.dat” ,  1 
"d:\data\corrdat4\fcorr239.dat" ,  2 
"d:\data\corrdatl\fcorr200.dat" ,  1 
"d : \data\corrdat4\fcorr241 .dat" ,  2 
"d : \data\corrdat l\f corr202 . dat" ,  1 
"d : \data\corrdat4\f corr243 . dat” ,  2 


Source  Code  for  CONSTRUC 

'Program:  CONSTRUC. BAS 
'Author:  John  U.  DeBerry 

'Description:  This  routine  reads  a  ASCII  data  file  called 
'names$  consisting  of  multiple  lines  of  a  sequence  number, 

'a  filename,  and  a  class  number.  The  filenames  contain  50 
'element  correlation  product  vectors  in  a  column.  Each  vector 
'is  read  and  then  written  to  a  super  data  file  of  the  name 
'specified  by  the  user.  The  format  of  the  super  file  is: 

'  Sequence  *1,  element (0),  element (1) . element (49) 

'  Class  # 

'  Sequence  #2,  element(O),  eleaent(l),  ...,  element(99) 

'  Class  # 

'  Sequence  #3,  element(O),  element(l),  ....  element(99) 

'  Class  t 


'  EOF 

J 

'This  super  file  will  be  used  as  an  input  file  for  the 
'NeuralGraphics  simulator  written  by  Greg  Tarr. 

INPUT  "What  is  the  file  containing  the  name  k  class  data";  names$ 
CLS 
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INPUT  "What  shall  I  name  the  super  data  file";  super$ 

CLS 

INPUT  "How  many  files  are  to  be  training  vectors";  exemplars'/, 
CLS 

INPUT  "How  many  are  to  be  test  vectors";  texemplars% 

CLS 

INPUT  "How  many  classes  of  vectors  are  there";  outelements'/, 
CLS 

INPUT  "How  many  elements  per  vector";  inelementsy, 

CLS 

DIM  vector ! ( inelement sX) 

PRINT  "Name  file  -  " ;  names$ 

PRINT  "Super  file  -  ";  super$ 

PRINT 

PRINT  "File  being  processed  -  "; 

OPEN  super$  FOR  OUTPUT  AS  #1 
OPEN  namesS  FOR  INPUT  AS  #2 

PRINT  #1,  exemplars'/,;  texemplars'/,;  inelements'/,;  outelements'/, 
DO  UNTIL  EOF (2) 

INPUT  #2,  number^,  file$,  class'/ 

LOCATE  4,  26 
PRINT  file$ 

OPEN  fileS  FOR  INPUT  AS  #3 
FOR  i  ■  0  TO  (inelementsy,  -  1) 

INPUT  #3,  vector! (i) 

NEXT  i 
CLOSE  #3 

PRINT  #1,  number^; 

FOR  i  *  0  TO  (inelementsX  -  1) 

PRINT  #1,  vector! (i);  "  "; 

NEXT  i 
PRINT  #1, 

PRINT  #1,  classy, 

LOOP 

CLOSE 

END 
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B.3  Processing  of  NeuralGraphics  Output. 

In  this  section,  samples  of  the  actual  data  file  output  of  the  NeuralGraphics 
software  will  be  shown;  one  file  containing  the  results  of  testing  a  network  with 
the  test  data,  and  one  file  containing  the  training  history  data.  Following  that  the 
QuickBasic  source  code  for  several  routines  used  to  process  the  data  are  shown.  The 
routine  PTABLE-2.BAS  and  MV3TABLE.BAS  operate  on  the  network  performance 
data,  while  GOOD. BAS  operates  on  the  history  files.  The  routine  THEORY. BAS 
operates  on  the  output  files  of  PTABLE.2.BAS.  The  first  three  routines  expect  two 
things,  a  file  containing  a  list  of  filenames  to  operate  on,  and  the  actual  files  specified 
by  that  list.  THEORY. BAS  uses  the  names  in  the  list  as  names  for  the  P  matrices 
it  constructs  from  the  matrices  found  in  the  matrix  table  it  operates  on.  This  was 
done  just  to  keep  from  mixing  up  the  matrices. 


Sample  output  file  from  NeuralGraphics  simulator 
netl0.dat 

Exemplar  #  True  Class  Net  Decision 


103  1  2 

104  2  2 

105  1  1 

106  2  2 

107  1  1 

108  2  2 

109  1  1 

110  2  2 

111  1  2 

112  2  2 

113  1  2 
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114 

2 

2 

115 

1 

1 

116 

2 

2 

117 

1 

1 

118 

2 

2 

119 

1 

2 

120 

2 

2 

121 

1 

1 

122 

2 

2 

123 

1 

1 

124 

2 

2 

125 

1 

1 

126 

2 

2 

127 

1 

1 

128 

2 

2 

129 

1 

1 

130 

2 

2 

131 

1 

2 

132 

2 

2 

133 

1 

1 

134 

2 

2 

135 

1 

2 

136 

2 

2 

137 

1 

1 

138 

2 

2 

139 

1 

1 

140 

2 

2 

141 

1 

1 

142 

2 

2 

143 

1 

1 

144 

2 

1 

145 

1 

2 

146 

2 

2 

147 

1 

2 

148 

2 

2 

149 

1 

1 

150 

2 

2 

151 

1 

2 

152 

2 

1 

153 

1 

1 

154 

2 

2 

155 

1 

1 
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156 

2 

2 

157 

1 

1 

158 

2 

2 

159 

1 

1 

160 

2 

2 

161 

1 

1 

162 

2 

2 

163 

1 

2 

164 

2 

2 

165 

1 

1 

166 

2 

2 

167 

1 

2 

168 

2 

2 

169 

1 

2 

170 

2 

2 

171 

1 

1 

172 

2 

2 

173 

1 

1 

174 

2 

2 

175 

1 

1 

176 

2 

2 

177 

1 

1 

178 

2 

2 

179 

1 

1 

180 

2 

2 

181 

1 

2 

182 

2 

2 

183 

1 

1 

184 

2 

2 

185 

1 

1 

186 

2 

1 

187 

1 

1 

188 

2 

2 

189 

1 

1 

190 

2 

2 

191 

1 

1 

192 

2 

2 

193 

1 

2 

194 

2 

2 

195 

1 

1 

196 

2 

2 

197 

1 

1 
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Sample  history  file  produced  by  NeuralGraphics  simulator 
Training  history  -  netlO 


Seed  - 

Count 

620962919 

Error 

Training 

Right 

Good 

Test 

Right 

Good 

#learned 

1000 

7.61 

11.80 

48.80 

30.00 

62.00 

0 

2000 

5.53 

68.50 

85.90 

65.00 

79.00 

1 

3000 

3.64 

84.50 

92.40 

73.00 

82.00 

25 

4000 

1.83 

96.20 

98.70 

76.00 

84.00 

78 

5000 

1.23 

97.20 

99.20 

76.00 

82.00 

93 

6000 

0.43 

99.70 

100.00 

78.00 

81.00 

99 

7000 

0.24 

100.00 

100.00 

79.00 

81.00 

100 

8000 

0.17 

100.00 

100.00 

79.00 

82.00 

100 

9000 

0.15 

100.00 

100.00 

79.00 

82.00 

100 

10000 

0.13 

100.00 

100.00 

79.00 

82.00 

100 

11000 

0.12 

100.00 

100.00 

80.00 

82.00 

100 

12000 

O.lt 

100.00 

100.00 

80.00 

82.00 

100 

13000 

0.11 

100.00 

100.00 

80.00 

82.00 

100 

14000 

0.10 

100.00 

100.00 

79.00 

82.00 

100 

15000 

0.10 

100.00 

100.00 

80.00 

82.00 

100 

16000 

0.09 

100.00 

100.00 

80.00 

82.00 

100 

17000 

0.09 

100.00 

100.00 

80.00 

82.00 

100 

18000 

0.09 

100.00 

100.00 

80.00 

82.00 

100 

19000 

0.08 

100.00 

100.00 

80.00 

82.00 

100 

20000 

EOF 

0.08 

100.00 

100.00 

80.00 

82.00 

100 
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'Program:  PTABLE_2.BAS 
'Author:  John  W.  DeBerry 

'This  routine  reads  the  data  file  containing  the  final 
'classifications  yielded  by  a  net  on  the  test  set  exemplars. 

'It  then  calculates  the  P  matrix  for  that  net.  Multiplying 
'the  values  of  the  P  matrix  by  100  yields  the  actual  observed 
'percent  correct  (or  wrong)  performance  of  the  net.  The 
'routine  writes  this  info  o  a  file  in  rows  for  each  net 
'data  file  in  the  name$  file.  If  the  user  specifies  LOTUS 
'format,  the  result  is  a  table  ready  for  import  into  LOTUS 
'for  computing  the  average  and  STD  if  each  column  (  P(l/1), 

<P(2/1)  ,  P(l/2)  ,  P(2/2) ,  and  P(good)  ) 

OPTION  BASE  1 
REM  $DYNAMIC 
LOCATE  23,  2 

INPUT  "What  source  file  for  the  data  file  names";  name$ 

CLS 

LOCATE  23,  2 

INPUT  "Filename  for  probability  matrix  table";  matrix! 

CLS 

LOCATE  23,  2 

INPUT  "Filename  for  out  of  class  vector  log";  verror$ 

CLS 

DO 

LOCATE  23,  2 

INPUT  "Do  you  want  Lotus  type  file";  a! 

LOOP  UNTIL  a$  *  "y"  OR  a$  *  "n" 

CLS 

LOCATE  12,  31 
PRINT  "WORKING . " 

DIM  testinfo%(3)  ,  cerrorX(2) ,  vectora jm%(50) ,  classcounty,(2) 

DIM  prob! (5) 

OPEN  matrix!  FOR  OUTPUT  AS  #1 
OPEN  name!  FOR  INPUT  AS  *2 
OPEN  verror!  FOR  OUTPUT  AS  #3 
IF  a!  -  "n"  THEN 

PRINT  #1  "Net  ID  P(1 1 1)  P(2|l)  P(l|2)  P(2|2)  P(good)’ 

PRINT  #1,"— . . . 

END  IF 

PRINT  #3  "Net  ID  Out  -  of  -  Class  vectors" 

PRINT  #3," . - . 
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DO  UNTIL  EOF (2) 

FOR  i  -  1  TO  2 
cerrorX(i)  *  0 
classcountX(i)  *  0 
NEXT  i 
wrongX  ■  0 
INPUT  #2,  net$ 

OPEN  net$  FOR  INPUT  AS  #4 
LINE  INPUT  #4,  junkS 
DO  UNTIL  EOF (4) 

INPUT  #4,  testinfoX(l) ,  testinfoX(2) ,  testinfoX(3) 
IF  testinfoX(2)  *  1  THEN 

classcountX(l)  *  classcountX(l)  +  1 
IF  testinfoX(2)  <>  testinfoX(3)  THEN 
wrongX  ■  wrongX  +  1 
vectornumX (wrongX)  *  test inf oX(l) 
cerrorX(l)  *  cerrorX(l)  +  1 
END  IF 

ELSEIF  test infoX (2)  -  2  THEN 

classcountX(2)  *  classcountX(2)  +  1 
IF  testinfoX(2)  <>  testinfoX(3)  THEN 
wrongX  *  vrongX  +  1 
vectornumX (wrongX)  *  testinfoX(l) 
cerrorX(2)  ■  cerrorX(2)  +  1 
END  IF 
END  IF 
LOOP 

CLOSE  #4 

countX  *  classcountX(l)  +  classcountX(2) 
prob!(2)  ■  cerrorX(l)  /  classcountX(l) 
prob!(3)  ■  cerrorX(2)  /  classcountX(2) 
prob!(l)  ■  1  -  prob!(2) 
prob!(4)  ■  1  -  prob!(3) 

prob!(5)  ■  CclasscountX(l)/countX)*prob! (1)+ 

(classcountX(2) /countX) *prob! (4) 
PRINT  #3.  LEFTS (netS>  LEN(netS)  -  4) ; 

IF  LEN(netS)  -  4  -  4  THEN  PRINT  #3,  SPC(4); 

IF  LEN(netS)  -  4  -  5  THEN  PRINT  #3,  SPC(3); 

IF  LEN(netS)  -  4  -  6  THEN  PRINT  #3,  SPC(2) ; 

FOR  i  *  1  TO  vrongX 

PRINT  *3,  vectornumX(i) ; 

NEXT  i 
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PRINT  #3, 

IF  a$  -  "n"  THEN 

PRINT  #1,  LEFT$(net$,  LEN(net$)  -  4), 

FOR  i  *  1  TO  5 

PRINT  #1,  probKi);  SPC(4); 

NEXT  i 
PRINT  #1, 

ELSEIF  a$  =  "y"  THEN 

PRINT  #1,  CHR$(34) ;  LEFT$(net$,  LEN(net$)  -  4);  CHR$(34); 
FOR  i  *  1  TO  5 

PRINT  #1,  prob!(i); 

NEXT  i 
PRINT  #1, 

END  IF 
LOOP 
CLS 
CLOSE 
END 
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'Program:  MV3TABLE.BAS 
'Author:  John  W.  DeBerry 

'This  routine  takes  three  net  decision  data  files  and 
'contructs  the  P  matrix  for  the  equivalent  majority 
'vote  network.  The  P  matrix  for  this  equivalent  network 
'is  then  written  in  row  format  to  matrix$.  The  matrix$ 
'file  can  be  imported  directly  into  LOTUS. 


OPTION  BASE  1 
REM  $ DYNAMIC 
LOCATE  23,  2 

INPUT  "What  source  file  for  majority  vote  file  names";  name$ 

CLS 

LOCATE  23,  2 

INPUT  "Filename  for  probability  matrix  table";  matrix! 

CLS 

LOCATE  23,  2 

INPUT  "Filename  for  out  of  class  vector  log";  verror! 

CLS 

DO 

LOCATE  23,  2 

INPUT  "Do  you  want  Lotus  type  file";  a! 

LOOP  UNTIL  a$  -  "y"  OR  a$  -  "n" 

CLS 

LOCATE  12,  31 
PRINT  "WORKING . " 

DIM  tertinfoXO)  ,  cerrorX(2) ,  vectornum%(50)  ,  classcountX(2) 

DIM  testinfol%(3) ,  testinfo2%(3) ,  testinfo3%(3) 

DIM  prob! (5) 

OPEN  matrix!  FOR  OUTPUT  AS  #1 
OPEN  name!  FOR  INPUT  AS  *2 
OPEN  verror!  FOR  OUTPUT  AS  #3 
IF  a!  -  "n"  THEN 

PRINT  #1, "Net  ID  PCI  1 1)  P(2|l)  P(l|2)  P(2|2)  P(good)" 

PRINT  #1," - 

END  IF 

PRINT  #3, "Net  ID  Out  -  of  -  Class  vectors" 

PRINT  #3," - 

DO  UNTIL  E0F(2) 

FOR  i  -  1  TO  2 

cerrorX(i)  ■  0 


B-16 


classcountX(i)  =  0 
NEXT  i 
wrongX  *  0 

INPUT  #2,  netl$,  net 2$,  net3$,  mvnet$ 

OPEN  netl$  FOR  INPUT  AS  #4 
OPEN  net 2$  FOR  INPUT  AS  #5 
OPEN  net3$  FOR  INPUT  AS  #6 
LINE  INPUT  #4,  junk$ 

LINE  INPUT  #5,  junk$ 

LINE  INPUT  #6,  junk$ 

DO  UNTIL  EOF (6) 

INPUT  #4,  testinfolX(l) ,  testinf olX(2) ,  testinfolX(3) 
INPUT  #5,  testinfo2X(l) ,  testinf o2X(2)  ,  testinfo2X(3) 
INPUT  #6,  testinfo3X(l) ,  testinf 03% (2)  ,  testinfo3X(3) 
testinfoX(l)  »  testinfolX(l) 
testinfoX(2)  ■  testinfolX(2) 

voteX  *  testinf oiy,(3)  +  testinfo2y,(3)  +  testinfo3y,(3) 
IF  voteX  <«  4  THEN 
testinfoy,(3)  ■  1 
ELSEIF  votey,  >-  5  THEN 
testinfoX(3)  ■  2 
END  IF 

IF  test inf o%(2)  -  1  THEN 

classcountX(l)  *  classcountX(l)  +  1 
IF  testinfoX(2)  <>  testinfoX(3)  THEN 
wrongX  »  wrongX  +  1 
vectornumX  (wrongX)  *  test infoX (1) 
cerrorX(l)  *  cerrorX(l)  +  1 
END  IF 

ELSEIF  testinfoX(2)  -  2  THEN 

dasscountX(2)  ■  dasscountX(2)  +  1 
IF  testinfoX(2)  <>  testinfoXO)  THEN 
wrongX  ■  wrongX  +  1 
vectornumX (wrongX)  ■  testinfoX(l) 
cerrorX(2)  »  cerrorX(2)  +  1 
END  IF 
END  IF 
LOOP 
CLOSE  #4 
CLOSE  «5 
CLOSE  *6 

countX  ■  classcountX(l)  ♦  claascountX(2) 
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prob!(2)  ■  cerrorXCl)  /  classcounty,(l) 
prob!(3)  ■  cerrorX(2)  /  classcounty,(2) 
prob!(l)  *  1  -  prob!(2) 
prob!(4)  ■  1  -  prob!(3) 

prob!(5)  *  (classcounty,(l)/countX)*prob!  (1)  + 

(cla8scounty,(2)/count)*prob!  (4) 

PRINT  #3,  LEFT$ (mvnetS ,  LEN(mvnetS)  -  4); 

IF  LEN(mvnetS)  -  4  -  6  THEN  PRINT  #3,  SPC(4) ; 

IF  LEN(mvnetS)  -  4  -  7  THEN  PRINT  #3,  SPC(3); 

IF  LEN(mvnetS)  -  4  »  8  THEN  PRINT  #3,  SPC(2); 

FOR  i  »  1  TO  wrongX 

PRINT  #3,  vectonmmy,(i) ; 

NEXT  i 
PRINT  #3, 

IF  a$  «  "n"  THEN 

PRINT  #1,  LEFTS (mvnetl,  LEN(mvnetS)  -  4), 

FOR  i  «  1  TO  5 

PRINT  #1,  prob!(i);  SPC(4); 

NEXT  i 
PRINT  #1, 

ELSEIF  a$  »  "y"  THEN 

PRINT  #1,  CHR$(34) ;  LEFTS (mvnetS ,  LEN(mvnetS)  -  4);  CHR$(34) ; 
FOR  i  »  1  TO  5 

PRINT  #1,  prob! (i) ; 

NEXT  i 
PRINT  #1, 

END  IF 
LOOP 
CLS 
CLOSE 
END 


B-18 


*  Program:  MVTHEORY . BAS 
'Author:  John  VJ.  DeBerry 

'This  routine  constructs  the  THEORETICAL  majority  vote 
'P  matrix  from  the  P  matrices  of  three  nets.  The  computations 
'assume  independence.  The  output  file  matrixl  is  rov  oriented 
'and  can  be  directly  imported  into  LOTUS.  The  three  nets  used 
'to  generate  the  majority  vote  matrix  are  read  from  name$.  The 
'first  majority  vote  net  is  constructed  from  the  first  three 
'net  matrices  in  the  single  net  P  matrix  table,  the  next  from 
'the  next  three  matrices  in  the  table,  etc.  The  total  number  of 
'rows  in  the  single  net  matrix  table  should  be  divisible  by  three. 
'The  program  reads  netname$  for  assigning  a  name  to  the  majority 
'vote  nets  (for  flexibility  of  filenaming).  The  user  must  know 
'the  a  priori  probabilities  of  the  test  set  exemplars. 

OPTION  BASE  1 
REM  $ DYNAMIC 
LOCATE  23,  2 

INPUT  "What  source  file  for  single  net  matrix";  name$ 

CLS 

LOCATE  23,  2 

INPUT  "Filename  for  probability  matrix  table";  matrix$ 

CLS 

LOCATE  23,  2 

INPUT  "What  are  the  a  priori  probabilities  (class  1,  class2)";  pi,  p2 
LOCATE  23,  2 

INPUT  "What  is  the  file  containing  the  names  of  the  mvnets";  netname$ 

LOCATE  12,  31 

PRINT  "WORKING . " 

DIM  nl(5),  n2(5) ,  n3(5) 

DIM  prob! (5) 

OPEN  matrix!  FOR  OUTPUT  AS  #1 
OPEN  name!  FOR  INPUT  AS  *2 
OPEN  netnamel  FOR  INPUT  AS  *3 
DO  UNTIL  EOF (2) 

INPUT  t3,  mvnetl 

INPUT  #2,  junkl,  nl(l),  nl(2),  nl(3),  nl(4),  nl(5) 

INPUT  *2,  junkl,  n2(l),  n2(2),  n2(3),  n2(4),  n2(5) 

INPUT  #2,  junkl,  n3(l),  n3(2),  n3(3),  n3(4),  n3(5) 

prob! (I)»nl(l)*n2(l)*n3(l)+nl(l)*n2(l)*n3(2)+nl(l)*n2(2)*n3(l)+ 

nl(2)*n2(l)*n3(l) 
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prob!(2)»l  -  prob!(l) 

prob ! (4) «nl (4) *n2 (4) *n3 (4) +nl (4) *n2 (4) *n3 (3) +nl (4) *n2 (3) *n3 (4) + 

nl (3) *n2 (4) *n3 (4) 


prob!(3)«l  -  prob! (4) 


prob! (5)*pl*prob! (1)  +  p2*prob!(4) 

PRINT  #1,  CHR$(34) ;  mvnet*;  CHR$(34) ; 
FOR  i  -  1  TO  5 

PRINT  #1,  prob! (i) ; 

NEXT  i 
PRINT  «1, 

LOOP 

CLS 

CLOSE 

END 
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'Program:  MVTHEORY . BAS 
'Author:  John  W.  DeBerry 

'This  routine  reads  stripped  output  from  the  NeuralGraphics 
'simulator  and  reads  only  the  desired  information.  It  vrites 
'the  P(good)  history  of  each  net  in  rows  to  a  file  training$. 
'The  rows  of  each  P(good)  history  for  each  net  listed  in  the 
'file  filename!  will  be  written.  The  training$  file  can  then 
'be  used  with  LOTUS  to  get  the  average  and  STD  value  at  each 
'1000  iteration  interval  up  to  20000. 


DIM  count (20),  aerr!(20),  tright!(20),  tgood!(20),  right! (20), 
good!  (20),  leam(20) 

INPUT  "What  filename  contains  the  history  file  names";  filename! 
CLS 

INPUT  "Uhat  filename  for  the  processed  data";  training! 

CLS 

OPEN  filename!  FOR  INPUT  AS  #1 
OPEN  training!  FOR  OUTPUT  AS  #3 

PRINT  #3,  CHR!(34);  "Count";  CHR!(34); 

PRINT  #3, 

DO  UNTIL  E0F(1) 

INPUT  *1,  name! 

OPEN  name!  FOR  INPUT  AS  *2 

FOR  i  -  1  TO  5 

LINE  INPUT  #2,  stuff! 

NEXT  i 

FOR  i  »  0  TO  19 

INPUT  #2,  count (i) ,aerr! (i) .tright! (i) ,tgood! (i) , right ! (i) , 
good! (i) , learn (i) 

NEXT  i 

PRINT  *3,  CHR!(34);  LEFT! (name!,  LEN(name!)  -  4);  CHR!(34); 
FOR  i  ■  0  TO  19 

PRINT  *3,  good! (i) ; 

NEXT  i 
PRINT  #3, 

CLOSE  #2 
LOOP 
CLOSE 
END 
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Appendix  C.  Table  of  Percentage  Points  of  the  Wilk-Shapiro 
Statistic  (reproduced  from  [14]) 
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Table  6.  Percentage  points  of  the  W  test*  for  n  =  3(1)50 


Level 


n 

3 

4 

5 

6 

7 

8 
9 

10 

11 


0-01 

0-02 

0-05 

0-753 

0-756 

0*767 

•687 

•707 

•748 

•686 

•715 

•762 

0-713 

0-743 

0-788 

•730 

•760 

•803 

•749 

•778 

•818 

•764 

•791 

•829 

•781 

•806 

•842 

0-792 

0-817 

0-850 

•805 

•828 

•859 

•814 

•837 

•866 

-825 

•846 

•874 

•835 

•855 

•881 

0-844 

0-863 

0-887 

-851 

•869 

•892 

•858 

•874 

•897 

-863 

•879 

•901 

•868 

•884 

•905 

0-873 

0-888 

0-908 

•878 

•892 

•911 

•881 

•895 

-914 

•884 

•898 

•916 

-888 

•901 

•918 

0-891 

0-904 

0-920 

•894 

-906 

•923 

•896 

•908 

•924 

•898 

•910 

•926 

•900 

•912 

•927 

0-902 

0-914 

0-929 

-904 

•915 

•930 

-906 

•917 

•931 

•908 

•919 

•933 

•910 

•920 

•934 

0-912 

0-922 

0-935 

•914 

•924 

•936 

•916 

•925 

•938 

•917 

•927 

•939 

•919 

■928 

•940 

0-920 

0-929 

0-941 

•922 

•930 

•942 

•923 

•932 

•943 

•924 

•933 

•944 

•926 

•934 

■945 

0-927 

0-935 

0-945 

•928 

•936 

•946 

•929 

•937 

•947 

•929 

•987 

•947 

•930 

•938 

•947 

0-10 

0-50 

0*90 

0-789 

0-959 

0-998 

•792 

•635 

•987 

•806 

■927 

•979 

0-826 

0-927 

0-974 

•838 

•928 

•972 

•851 

•932 

•972 

•859 

•635 

•972 

•869 

•938 

•972 

0-876 

0-940 

0-973 

•883 

•943 

•973 

•889 

•945 

•674 

•895 

•947 

•975 

•901 

•950 

•975 

0-906 

0-952 

0-976 

•910 

•954 

•977 

■914 

•956 

•978 

•917 

•957 

•978 

•920 

•959 

•979 

0-923 

0-960 

0-980 

•926 

•961 

■980 

•928 

•962 

•981 

•930 

•963 

-981 

•931 

•964 

•981 

0-933 

0-965 

0-982 

•935 

•965 

•982 

•936 

•966 

■982 

•937 

•966 

•982 

•939 

•967 

•983 

0-940 

0-967 

0-983 

•941 

•968 

■983 

•942 

•968 

■983 

■943 

■969 

•983 

•944 

•969 

•984 

0-945 

0-970 

0-984 

•946 

•970 

•984 

•947 

•971 

•984 

•948 

•971 

•984 

•949 

•972 

•985 

0-950 

0-972 

0-985 

■951 

•972 

•985 

•951 

•973 

•985 

•952 

•973 

•985 

•953 

•973 

•985 

0-953 

0-974 

0-985 

•954 

•974 

•985 

•954 

•974 

•985 

•955 

•974 

•985 

•955 
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*  Baaed  on  fitted  Johnson  (1949)  5j  approximation,  aee  Shapiro  ft  Wilk  (1965a)  for  details. 
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